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Abstract: We present basic notions of Gold's learnability in the limit paradigm, first pre- 
sented in 1967, a formalization of the cognitive process by which a native speaker gets to 
grasp the underlying grammar of his/her own native language by being exposed to well 
formed sentences generated by that grammar. Then we present Lambek grammars, a for- 
malism issued from categorial grammars which, although not as expressive as needed for 
a full formalization of natural languages, is particularly suited to easily implement a nat- 
ural interface between syntax and semantics. In hte last part of this work, we present a 
learnability result for Rigid Lambek grammars from structured examples. 

Key-words: Formal Learning Theory, machine learning, Lambek calculus, computational 
linguistics, formal grammars 
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Une etude sur I'apprenabilite 
des Grammaires de Lambek Rigides 



Resume : On presente les notions basiques du paradigme d' apprenabilite a la Umite pour 
line classe de grammaires formelles defini par Gold en 1967, comme possible formalisation 
du processus cognitif qui permet I'apprentissage d'une langue naturelle a partir d'exemples 
d'enonces bien formes. Ensuite, nous presentons les grammaires de Lambek, un formalisme 
issu dos grammaires catcgoricllcs que, bien que encore insuffisant a rendre compte de nombre 
de phenomenes linguistiques, a des qualites interessantes par rapport a I'interface syntaxe- 
semantique. Enfin, nous presentons un resultat d'apprenabilite pour les grammaires de 
Lambek Rigides dans le modele d'apprentissage de Gold a partir d'exemples structures. 

Mots-cles : Thcorie formelle de I'apprentissage, apprentissage automatique, calcul de 
Lambek, linguistique computationnelle, grammaires formelles 



Learnability for Rigid Lambek Grammars 



3 



1 Introduction 

How comes it that human beings, whose contacts with the world are brief and 
personal and limited, are nevertheless able to know as much as they do know? 

Sir Bertrand Russell (citato da Noam Chomsky in |(;ho75| l. 

Formal Learning Theory was first defined in an article by E. M. Gold in 1967 (see IGolfiTp 
as a first effort to provide a rigurous formalization of grammatical inference, that is the 
process by which a learner, presented with a certain given subset of well-formed sentences of 
a given language, gets to infer the grammar that generates it. The typical example of such 
a process is given by a child whi gets to master, in a completely spontaneous way and on 
the basis of the relatively small amount of information provided by sentences uttered in its 
cultural environment, the higly complex and subtle rules of her mother tongue, to the point 
that she can utter correct and original sentences before her third year of life. In |()Wd.TM97| 
such a formal framework is used in the broder context of the mathematical formalization of 
any kind of inductive reasoning. In this case the learner is "the scientist" who, on the basis 
of finite amount of empirical evidences provided by natural phenomena, formulates scientific 
hypotheses would could intensionally accunt for them. 

After an initial skepticism about the grammars that could be actually learnt in Gold's 
paradigm (a skepticism shared and in a way enouraged by Gold himself, who proves the non- 
learnability in its model of the four classes of grammars of Chomsky's hierarchy), recently 
there has been a renewal of interest toward this computational model of learning. One of 
the most recent results is Shinohara's (see |Shi9fl| l. who proves that as soon as we bound 
the number of rules in a context-sensitive grammar, it becomes learnable in Gold's paradigm. 

Lambek Grammars have recently known a renewed interest as a mathematical tool for 
the description of certain linguistics phenomena, after having being long neglected after 
their first definition in | iLam58| . Van Benthem was among the first who stressed the singu- 
lar correspondence between Montague Semantics (see |Mon97| l and the notion of structure 
associated to a sentence of a Lambek grammar. In particular, a recent work by Hans-Jorg 
Tiede (see [Tie99 j) has made clearer the notion of structure of a sentence in a Lambek gram- 
mar, in contrast with a previsous definition given by Buszkowski (see |Bus8fi| l. In doing 
so, he gets to prove a meaningful result about Lambek Grammars, that is that the class of 
tree languages generated by Lambek grammars strictly contains the class of tree languages 
generated by context-free grammars. 

Section 2 introduces the basic notions of Learning Theory by Gold and provides a short 
review of most important known fact and results about it. Section 3 is a short introduction 
fo Lambek Grammars: we give their definition and we present the features which make them 
attractive from a computational linguistics point of view. Section 4 briefly presents the class 
of rigid Lambek Grammars, which is the object of our lerning algorithm, along with some 
basic properties and open questions. In Section 5 we present a learning algorithm for rigid 



RR n° 0123456789 



4 Bonato 



Lambek grammars from a structured input: the algorithm takes as its input a finite set of 
what has been defined in chapter 3 as proof tree structures. It is proved convergence for the 
algorithm and so the lernability for the class of rigid Lambek grammars. 
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2 Grammatical Inference 

2.1 Child's First Language Acquisition 

One of the most challenging goals for modern cognitive sciences is providing a sound theory 
accounting for the process by which any human being gets to master the highly complex 
and articulated grammatical structure of her mother tongue in a relatively small amount of 
time. Between the age of 3 and 5 we witness in children a linguistic explosion, at the end 
of which we can say that the child masters all the grammatical rules of her mother tongue, 
and subsequent learning is not but lexicon acquisition. Moreover, cognitive psychologists 
agree (see [OGL95 ) in stating that the learning process is almost completely based on 
positive evidence provided by the cultural environment wherein the child is grown up: that 
is, correct statements belonging to her mother tongue. Negative evidence (any information 
or feedback given to the child to identify not- well-formed sentences), is almost completely 
absent and, in any case, doesn't seem to play any significant role in the process of learning 
(see |Pin94| l. Simply stated, the child acquires a language due to the exposition to correct 
sentences coming from her linguistic environment and not to the negative feedback she gets 
when she utters a wrong sentence. 

Providing a formal framework wherein to inscribe such an astounding ability to extract 
highly articulated knowledge (i.e. the grammar of a human language) from a relatively small 
amount of "raw" data (i.e. the statements of the language the child is exposed to during 
her early childhood) was one of the major forces that led to the the definition of a formal 
learning theory as the one we are going to describe in the following sections. 

2.2 Gold's Model 

The process of a child's first language acquisition can be seen as an instance of the more 
general problem of grammatical inference. In particular we will restrict our attention to 
the process of inference from positive data only. Simply stated, it's the process by which a 
learner can acquire the whole grammatical structure of a formal language on the basis of 
well-formed sentences belonging to the target language. 

In 1967 Gold defined (see {GoIST]) the formal model for the process of grammatical 
inference from positive data that will be adopted in the present work. In Gold's model, 
grammatical inference is conceived as an infinite process during which a learner is presented 
with an infinite stream of sentences sq, si, . . . , Sn ■ . ■, belonging to language which has to be 
learnt, one sentence at a time. Each time the learner is presented with a new sentence Si, 
she formulates a new hypothesis Gi on the nature of the underlying grammar that could 
generate the language the sentences she has seen so far belong to: since she is exposed to 
an infinite number of sentences, she will conjecture an infinite number of (not necessarily 
different) grammars Go-, Gi, ■ ■ ■ , Gn • ■ ■- 



RR n" 0123456789 



6 



Bonato 




G 

Two basic assumptions are made about the stream of sentences she is presented with: 
(i) only grammatical sentences (i.e. belonging to the target language) appear in the stream, 
coherently with our commitment to the process of grammar induction from positive data 
only; (ii) every possible sentence of the language must appear in the stream (which must be 
therefore an enumeration of the elements of the language) . 

The learning process is considered successful when, from a given point onward, the gram- 
mar conjectured by the learner doesn't change anymore and it coincides with the grammar 
that actually generates the target language. It is important to stress the fact that one 
can never know at any finite stage whether the learning has been successful or not due to 
the infinite nature of the learning process itself: at each finite stage, no one can predict 
whether next sentence will change or not the current hypothesis. The goal of the theory 
lies in devising a successful strategy for making guesses, that is, one which can be proved to 
converge to the correct grammar after a finite (but unknown) amount of time (or positive 
evidence, which is the same in our model). Gold called this criterion of successful learning 
identification in the limit. 

According to this criterion, a class of grammars is said to be learnable when, for any 
language generated by a grammar belonging to the class, and for any enumeration of its 
sentences, there is a learner that successfully identifies the correct grammar that generates 
the language. A good deal of current research on formal learning theory is devoted to 
identifying non-trivial classes of languages which are learnable in Gold's model or useful 
criterions to deduce (un)learnability for a class of languages on the basis of some structural 
property of the language. 

As it will be made clear in the following sections, accepting this criterion for successful 
learning means that we are not interested in when the learning has taken place: in fact there's 
no effective way to decide if it has or not at any finite stage. Our aim is to devise effective 
procedures such that, if applied to the infinite input stream of sentences, are guaranteed to 
converge to the grammar we are looking for, if it exists. 

3 Basic Notions 

We present here a short review of (Formal) Learning Theory as described in |Ka,n98| . whence 
we take the principal definitions and notation conventions. 
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3.1 Grammar Systems 

The fist step in the formalization of the learning process is the formal definition of both 
the "cultural environment" wherein this process takes place and the "positive evidences" the 
learner is exposed to. To do this, we introduce the notion of grammar system. 

Definition 3.1 (Grammar System) A grammar system is a triple (0,5, L), where 

• Ct is a certain recursive set of finitary objects on which mechanical computations can 
be carried out; 

• S is a certain recursive subset ofS,*, where S is a given finite alphabet; 

• 1j is a function that maps elements of fl to subsets of S, i.e. L : fl ^ piS). 

We can think of O as the "hypothesis space", whence the learner takes her grammatical 
conjectures, according to the positive examples she has been exposed to up to a certain finite 
stage of the learning process. Elements of are called grammars. 

Positi^•o examples presented to the learner belong to the set S (often we simply have 
5 = E"); its elements are cahed sentences, while its subsets are cahed languages. As it will 
be made clear in the following sections, the nature of elements in S strongly influences the 
process of learning: intuitively, we can guess that the more information they bear, the easier 
the learning process is, if it is possible at all. 

The function L maps each grammar G belonging to fl into a subset of S which is desig- 
nated as the language generated by G. That's why we often refer to L as the naming function. 
The question of whether s £ L(G) holds between any s G S and G G is addressed to as 
the universal membership problem. 

Example 3.2 Let E be any finite alphabet and let DFA be the set of deterministic finite 
automata whose input alphabet is S. For every M G DFA, let L(M) be the set of strings 
over S accepted by M. Then {DFA,12*,L) is a grammar system. 

Example 3.3 Let E be any finite alphabet and let RegExpr be the set of regular expressions 
over E. For every r € RegExpr, let L(r) be the regular language represented by r. Then 
{RegExpr,Ti* ,L) is a grammar system. 

Example 3.4 (Angluin, 1980) Let E any finite alphabet, and let Var be a countably in- 
finite set of variables, disjoint from E. A pattern over E is any element of (E U Var)~^ : let 
Pat be the set of patterns over E. For every p € Pat, let L(p) be the set of strings that 
can be obtained from p by uniformly replacing each variable x occurring in p by some string 
w G E+. The triple (Pat, E+,L) is a grammar system. 
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Figure 1: Grammatical Inference 
3.2 Learning Functions, Convergence, Learnability 

Once formally defined both the set of possible "guesses" the learner can make and the set 
of the positive examples she is exposed to, we need a formal notion for the mechanism by 
which the learner formulates hypotheses, on the basis finite sets of well-formed sentences of 
a given language, about the grammar that generates them. 

Definition 3.5 (Learning Function) Let (57, 5, L) be a grammar system. A learning 
function is a partial function that maps finite sets of sentences to grammars, 

k>l 

where S'^ denotes the set of k-ary sequences of sentences. 

A learning function can be seen as a formal model of the cognitive process by which a learner 
conjectures that a given finite set of sentences belongs to the language generated by a certain 
grammar. Since it's partial, possibly the learner cannot infer any grammar from the stream 
of sentences she has seen so far. 

According to the informal model outHned in section l2?2l in a successful learning process, 
we require the guesses made by the learner to remain the same from a certain point onward 
in the infinite process of learning. That is to say, there must be a finite stage (even if we 
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don't know which one) after which the grammar inferred on the basis of all the positive 
examples the learner has seen so far is always the same. This informal idea can be made 
precise by introducing the notion of convergence for a learning function: 

Definition 3.6 (Convergence) Let (il, S, L) be a grammar system, ip a learning function, 

(si)ieN = {so,Si,...) 
an infinite sequence of sentences belonging to S, and let 

Gi = (p{{so,...,Si)) 

for any i € N such that (f is defined on the finite sequence {sq, . . . , Si). (p is said to converge 
to G on (si)iGN if there exists n e N such that, for each i > n, Gi is defined and Gi = G 
(equivalently, if Gi = G for all but finitely many i gN). 

As we've already pointed out, one can never say exactly if and when convergence of a 
learning function to a certain grammar has taken place: this is due to the infinite nature of 
the process by which a learner gets to learn a given language in Gold's model. At any finite 
stage of the learning process there's no way to know whether the next sentence the learner 
will see causes the current hypothesis to change or not. 

We will say that a class of grammars is learnable when for each language generated 
by its grammars there exists a learning function which converges to the correct underlying 
grammar on the basis of any enumeration of the sentences of the language. Formally: 

Definition 3.7 (Learning Q) Let <S, L) be a grammar system, and ^ C O a given set 
of grammars. The learning function is said to learn Q if the following condition holds: 

• for every language L e L(^) = {L(G) | G G Q}, 

• and for every infinite sequence {si)i^n that enumerates L (i.e., {si \ i € N} = L) 

there exists a G G Q such that L(G) = L, such that ip converges to G on {si)i^^. 

So we will say that a given learning function converges to a single grammar, but that it 
learns a class of grammars. The learning for a single grammar, indeed, could be trivially 
implemented by a learning function that, for any given sequence of sentences as input, always 
returns that grammar. 

Definition 3.8 (Learnability of a Class of Grammars) A class Q of grammars is called 
learnable if and only if there exists a learning function that learns Q. It is called effectively 
learnable if and only if there is a computable learning function that learns Q. 

Obviously effective learnability implies learnability. 
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Example 3.9 Let {n,S,L) be any grammar system and let Q = {G{),Gi,G2} C Q, and 
suppose there are elements wi,W2 G S such that wi G L(Gi) — L(Go) and W2 G L(G2) — 
(L(Gi) U L(Go)). Then it's easy to verify that the following learning function learns Q: 

{G2 ifw2 e {so, . . . ,Sj}, 
Gi if wi e {so, . . . , Sj} and ^ {sq, . . . , Si}, 
Go otherwise. 

Example 3.10 Let's consider the grammar system {CFG, E*, L) of context-free grammars 
over the alphabet S. Let Q be the subclass of CFG consisting of grammars whose rules are 
all of the form 

S ^ w, 

where w € E*. We can easily see that L(CJ) is exactly the class of finite languages over S. 
Let 's define the learning function (p as 

^((so,...,s,)) = (E,{5},5,P), 

where 

P = {S ^ so,-.-,S ^ Si}. 

Then ip learns Q . 

3.3 Structural Conditions for (Un)Learnability 

One of the first important results in learnability theory presented in |fiol67| was a sufficient 
condition to deduce the unlearnability of a class Q of grammars on the basis of some formal 
properties of the class of languages C = L(^) (see theorem KH4II . We present here some 
structural conditions sufficient to deduce (un) learnability for a class of grammars. Such 
results are useful to get a deeper understanding to the general problem of learnability for a 
class of grammars. 

3.3.1 Existence of a Limit Point 

Let's define the notion of limit point for a class of languages: 

Definition 3.11 (Limit Point) A class C of languages has a limit point if there exists an 
infinite sequence {Ln)neH of languages in C such that 

io C Li C • • • C i„ C • • • 

and there exists another language L E C such that 

L={jL^ 

neN 

The language L is called limit point of C. 
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Figure 2: A limit point for a class of languages. 

Lemma 3.12 (Blum and Blum's locking sequence lemma, 1975) 

Suppose that a learning function ip converges on every infinite sequence that enumerates a 
language L. Then there is a finite sequence {wq, . . . ,wi) (called a locking sequence for (p 
and L ) with the following properties: 

(i) {wo, . . . , Wi} C i, 

(ii) for every finite sequence (vq, ■ ■ ■ ,Vm,) , if {vq, ■ ■ ■ ,Vm} L, then ip{{wQ, . . . ,wi)) = 

(p{{wo, ...,Wl,Vo,.. .,Vm))- 

Intuitively enough, the previous lemma (see |BB75| ) states that if a learning function con- 
verges, then there must exist a finite subsequence of input sentences that "locks" the guess 
made by the learner on the grammar the learning function converges to: that is to say, 
the learning function returns always the same grammar for any input stream of sentences 
containing that finite sequence. 

The locking sequence lemma proves one of the first unlearnability criterions in Gold's 
learnability framework: 

Theorem 3.13 IfL{Q) has a limit point, then Q is not learnahle. 

An easy consequence of the previous theorem is the following 

Theorem 3.14 (Gold, 1967) For any grammar system, a class Q of grammars is not 
learnahle «/L(C/) contains all finite languages and at least one infinite language. 

Proof sketch. Let Li C ^2 C . . . be a sequence of finite languages and let Loo — USi ^i- 
Suppose there were a learning function ip that learns the class {L | L is finite} U {Loo}- Then 
(/? must identify any finite language in a finite amount of time. But then we can build an 
infinite sequence of sentences that forces Lp to make an infinite number of mistakes: we first 
present p with enough examples from Li to make it guess Li; then with enough examples 
from L2 to make it guess L2, and so on. Note that all our examples belong to Loo. 
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3.3.2 (In)Finite Elasticity 

As we've seen in the previous section, the existence of a Hmit point for a class of languages 
implies the existence of an "infinite ascending chain" of languages like the one described by 
the following, weaker condition: 

Definition 3.15 (Infinite Elasticity) A class C of languages is said to have infinite elas- 
ticity if there exists an infinite sequence (s„)„gN of sentences and an infinite sequence 
{Ln)nen of languages such that for every rt G N, 

and 

{so, . . . , S„} C Ln+l- 

The following definition, although trivial, identifies an extremely useful criterion to deduce 
learnability for a class of grammars: 

Definition 3.16 (Finite Elasticity) A class C of languages is said to have finite elasticity 
if it doesn't have infinite elasticity. 

Dana Angluin proposed in |Ang80| a characterization of the notion of learnability in 
a "restrictive setting" which is of paramount importance in formal learning theory. Such 
restrictions are about the membership problem and the recursive enumerability for the class 
of grammars whose learnability is at issue. Let (fi, 5, L) be a grammar system and ^ C 17 
a class of grammars, let's define: 

Condition 3.17 There is an algorithm that, given s G 5 and G ^ Q , determines whether 
s G L(G'). 

Condition 3.18 Q is a recursively enumerable class of grammars. 

Condition l3.17l is usually referred to as decidability for the universal membership problem, 
and condition !;^. 18l as the recursive enumerability condition. Such restrictions are not unusual 
in concrete situations where learnability is at issue, so they don't significantly affect the 
usefulness of the following characterization of the notion learnability under such restrictive 
conditions. 

Theorem 3.19 (Angluin 1980) Let (il,iS,L) be a grammar system for which both con- 
ditions Vi.l'A and \S.lf^ hold, and let Q be a recursively enumerable subset of fl. Then Q is 
learnable if and only if there exists a computable partial function -0 : x N ^ <S such that: 

(i) for all n G N, i'{G, n) is defined if and only if G ^ Q and L(G) ^ 0; 

(ii) for all G £Q,Tg^ {i^{G, n) | n G N} is a finite subset of L(G); 

(iii) for all G,G' G G, if To C L(G'), then L(G') L(G). 
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Note: From this point onward, unless otherwise stated, we will restrict our attention to 
classes of grammars that fulfill both cond ition 13 . 1 71 and condition 13. 181 

Angluin's theorem introduces the notion of Tq as the tell-tale set for a given language. 
Learnability in the restricted environment is characterized by the existence of a mechanism 
(the function tp) to enumerate all the sentences belonging to such a finite subset of the target 
language. Even more, a tell-tale set for a given grammar G is such that if it is included in 
the language generated by another grammar G", then 

• either L{G) is included in L(G"), 

• or L(G") contains other sentences as well as those belonging to L(G). 

Otherwise stated, it is never the case that Tq Q L(G') C L(G). The point of the tell-tale 
subset is that once the strings of that subset have appeared among the sample strings, we 
need not fear overgeneralization in guessing a grammar G. This is because the true answer, 
even if it is not L(G), cannot be a proper subset of L(G). This means that a learner who 
has seen only the sentences belonging to the tell-tale set for a given grammar G, is justified 
in conjecturing G as the underlying grammar, since doing so never results in overshooting 
or inconsistency. 




Figure 3: A tell-tale set for L(G). 

As a consequence of Angluin's theorem, Wright proved in |Wri89j the following 

Theorem 3.20 (Wright, 1989) Let {n,S,L) and Q he as in theorem. UA^ If HG) has 

finite elasticity, then Q is learnable. 

In such a restricted framework, therefore, the task of proving learnability for a certain class 
of grammars can be reduced to the usually simpler task of proving its finite elasticity. 
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Due to Wright's theorem we can estabHsh the following useful impHcations 

L(5) has finite elasticity 4 Q is learnable 
h{Q) has a limit point => Q is Mnlearnable 

Q is wnlearnable 4> L(5) has infinite elasticity 

The impHcations indicated by 4> depend on the decidability of universal membership and 
recursive enumerability of the class of grammars at issue, as defined in conditions 13. 171 and 
I3l8l 

3.3.3 Kanazawa's Theorem 

The following theorem (see |Kan98j '). which is a generalization of a previous theorem by 
Wright, provides a sufficient condition for a class of grammars to have finite elasticity, and 
therefore to be learnable. A relation i? C S* x T* is said to be finite-valued if and only if 
for every s £ S*, the set {u e T* | sRu} is finite. 

Theorem 3.21 Let M he a class of languages over T that has finite elasticity, and let 
i? C S* X T* he a finite-valued relation. Then C = {R^^[M] | Af G A^} also has finite 
elasticity. 

This theorem is a powerful tool to prove finite elasticity (and therefore learnability) for 
classes of grammars. Once we prove the finite elasticity for a certain class of grammars 
in the "straight" way, we can get a proof for finite elasticity of other classes of grammars, 
due to the relatively loose requirements of the theorem. All we have to do is to devise a 
"smart" finite- valued relation between the first class and a new class of grammars such that 
the anti-image of the latter under this relation is the class for which we want to prove finite 
elasticity. 



R 







R'[M] 


M 



Figure 4: Kanazawa's theorem. 
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3.4 Constraints on Learning Functions 

In the definition of learnability nothing is said about the behaviour of learning functions 
apart from convergence to a correct grammar. Further constraints can be imposed: one can 
choose a certain learning strategy. Intuitively, a strategy refers to a policy, or preference, 
for choosing hypotheses. Formally, a strategy can be analyzed as merely picking a subset of 
possible learning functions. Strategies can be grouped by numerous properties. We choose 
to group them by restrictiveness, defined as follows: 

Definition 3.22 (Restrictiveness) If a strategy constrains the class of learnable languages 
it is said to be restrictive. 

For example, strategies are grouped as computational constraints (computability, time 
complexity), constraints on potential conjectures (consistency), constraints on the relation 
between conjectures (conservatism), etc. Since the classes we will be discussing are all classes 
of recursive languages, "restrictive" will be taken to mean "restrictive for classes of recursive 
languages". 

3.4.1 Non-restrictive Constraints 

The proof of theorem \'A.W\ implies that in a grammar system where universal membership 
is decidable, a recursively enumerable class of grammars is learnable if and only if there is a 
computable learning function that learns it order-independently, prudently, and is responsive 
and consistent on this class. 

Definition 3.23 (Order-independent Learning) A learning function tp learns Q order- 
independently if for all L G L(5), there exists G G G such that L(G) = L and for all infinite 
sequences (si)igN that enumerate L, (p converges on (si)ieN to G. 

Intuitively this seems a reasonable strategy. There does not seem to be an a priori reason 
why either the order of presentation should influence the flnal choice of hypothesis. On the 
other hand, it has already been proved (see iJ0HB9^) that in any grammar system, a class 
of grammars is learnable if and only if there is a computable learning function that learns 
it order-independently. 

Definition 3.24 (Exact Learning) A learning function learns Q exactly if for all Q' 
such that Lp learns Q' , ^{Q') ^ L(S)- 

In other words, the learning function will not hypothesize grammars that are outside its 
class. This is not really a constraint on learning functions, but on the relation between a 
class of languages and a learning function. For every learning function there exists a class 
that it learns exactly. The reason for this constraint is the idea that children only learn 
languages that have at least a certain minimal expressiveness. If we want to model language 
learning, we want learning functions to learn a chosen class exactly. There seems to be 
empirical support for this idea. Some of it comes from studies of children raised in pidgin 
dialects, some from studies of sensory deprived children (see |Pin94| l. 
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Definition 3.25 (Prudent Learning) A learning function if learns Q prudently iff learns 
Q and range(</?) C Q. 

Note that prudent learning implies exact learning. This reduces to the condition that a 
learning function should only produce a hypothesis if the learning function can back up its 
hypotheses, i.e. if the hypothesis is confirmed by the input, the learning function is able to 
identify the language. 

Definition 3.26 (Responsive Learning) A learning function ip is responsive on Q if for 
any L G L(t/) and for any finite sequence {sq, . . . ,Si) of elements of L ({(sq, . . . , Sj)} C L), 
ip{{so, . . . , Si)) is defined. 

This constraint can be regarded as the complement of prudent learning: if all sentences 
found in the input arc in a language in the class of languages learned, the learning function 
should always produce a hypothesis. 

Definition 3.27 (Consistent Learning) A learning function ip is consistent on Q if for 
anyL€:h[G) and for any finite sequence {so,...,Si) of elements of L , either ip{{so, Si)) 
is undefined or {sq, ■ ■ ■ , Sj} C 'L{ip{{so, . . . , Si))). 

The idea behind this constraint is that all the data given should be explained by the chosen 
hypothesis. It should be self-evident that this is a desirable property. Indeed, one would 
almost expect it to be part of the definition of learning. However, learning functions that 
are not consistent arc not necessarily trivial. If, for example, the input is noisy, it would not 
be unreasonable for a learning function to ignore certain data because it considers them as 
unreliable. Also, it is a well known fact that children do not learn languages consistently. 

3.4.2 Restrictive Constraints 

Definition 3.28 (Set-Drivenness) A learning function ip learns Q set-driven ifip{{so, . . . ,s 

is determined by {sq, . . . , st} or, more precisely, if the following holds: whenever {sq, ■ ■ ■ , Si} = 
{uq, . . . ,Uj}, (f>{{so, ■ ■ ■ , Si)) is defined if and only if ip{{uo, . . . ,Uj)) is defined, and if they 
are defined, they are equal. 

It is easy to see that set-drivcnncss implies order-independence. Set-driven learning could be 
very loosely described as order-independent learning with the addition of ignoring "doubles" 
in the input. It is obvious that this is a nice property for a learning function to have: 
one would not expect the choice of hypothesis to be influenced by repeated presentation of 
the same data. The assumption here is that the order of presentation and the number of 
repetitions are essentially arbitrary, i.e. they carry no information that is of any use to the 
learning function. One can devise situations where this is not the case. 

Definition 3.29 (Conservative Learning) A learning function (p is conservative if for 
any finite sequence (sq, ■ ■ ■ , Si) of sentences and for any sentence Sj+i, whenever (p{{so, . . . , Sj)) 
is defined and Sj+i € L(i^((sO) • • • , Si))), v((so, • • • , Sj, Sj+i)) is also defined and y((so, . . . , Sj)) 

tp{{so,...,Si,Si+i)). 
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At first glance conservatism may seem a desirable property. Why change your hypothesis 
if there is no direct need for it? One could imagine cases, however, where it would not 
be unreasonable for a learning function to change its mind, even though the new data fits 
in the current hypothesis. Such a function could for example make reasonable but "wild" 
guesses which it could later retract. The function could "note" after a while that the inputs 
cover only a proper subset of its conjectured language. While such behaviour will sometimes 
result in temporarily overshooting, such a function could still be guaranteed to converge to 
the correct hypothesis in the limit. 

It is a common assumption in cognitive science that human cognitive processes can be 
simulated by computer. This would lead one to believe that children's learning functions 
are computable. The corresponding strategy is the set of all partial and total recursive 
functions. Since this is only a subset of all possible functions, the computability strategy is 
a non trivial hypothesis, but not necessarily a restrictive one. 

The computability constraint interacts with consistency (see |Ful88| l: 

Proposition 3.30 There is a collection of languages that is identifiable by a computable 
learning function but by no consistent, computable learning function. 

The computability constraint also interacts with conservative learning (see |Ang80| ): 

Proposition 3.31 (Angluin, 1980) There is a collection of languages that is identifiable 
by a computable learning function but by no conservative, computable learning function. 

Definition 3.32 (Monotonicity) The learning function ip is monotone increasing if for all 
finite sequences (sq, . . . , s„) and (sq, . . . , Sn+m), whenever ip{{so, . . . , s„)) and ip{{so, . • . , s„+m)) 
are defined, 

L{ip{{so, Sn))) C L{ip{{so, S„+„i))). 

When a learning function that is monotone increasing changes its hypothesis, the language 
associated with the previous hypothesis will be (properly) included in the language associ- 
ated with the new hypothesis. There seems to be little or no empirical support for such a 
constraint. 

Definition 3.33 (Incrementality, Kanazawa 1998) The learning function ip is incre- 
mental if there exists a computable function such that 

(p{{so, Sn+l)) ^ tpiifiiiso, . . . , Sn)), Sn+l)- 

An incremental learning function does not need to store previous data. All it needs is current 
input, Sn, and its previous hypothesis. A generalized form of this constraint, called memory 
limitation, limits access for a learning function to only n previous elements of the input 
sequence. This seems reasonable from an empirical point of view; it seems improbable that 
children (unconsciously) store all utterances they encounter. 
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Note that, on an infinite sequence enumerating language L in L{Q), a conservative learn- 
ing function ip learning Q never outputs any grammar that generates a proper superset of L. 

Let ip he a conservative and computable learning function that is responsive and consis- 
tent on G, and learns G prudently. Then, whenever {sq, . . . , Sn} C L for some L S L(t/), 
L{(p{{so, . . . , Sn))) must be a minimal element of the set {L e L(5) | {sg, ■ ■ ■ , s„} C L}. This 
implies the following condition: 

Condition 3.34 There is a computable partial function that takes any finite set D of 
sentences and maps it to a grammar ip{D) € G such that L(-i/;(£))) is a minimal element of 
{L € L(^) \ D C L} whenever the latter set is non-empty. 

Definition 3.35 Let a computable function satisfying condition \3.34\ Define 
function ip as follows 



piiso)) i^{{s,}), 

(p{{so, ■ ■ ■,Si)) ifsi+i e L{p{{so, . . . ,Si))), 
^({sq, . . . , Si+i}) otherwise. 



ipiiso, ...,s, + l)) 



Under certain conditions the function just defined is guaranteed to learn G, one such case is 
where L{G has finite elasticity. 

Proposition 3.36 Let G be a class of grammars such that L(tJ) has finite elasticity, and a 
computable function tp satisfying condition \3.34\ exists. Then the learning function p defined 
in deftnition \3.35\ learns G. 



4 Is Learning Theory Powerful Enough? 
4.1 First Negative Results 

One of the main and apparently discouraging consequences of the theorem 13.141 proved by 
Gold in the original article wherein he laid the foundations of Formal Learning Theory was 
that none of the four classes of Chomsky's Hierarchy is learnable under the criterion of 
identification in the limit. Such a first negative result has been taken for a long time as 
a proof that identifying languages from positive data according to his identification in the 
limit criterion was too hard a task. Gold himself looks quite pessimistic about the future of 
the theory he has just defined along its main directions: 

However, the results presented in the last section show that only the most trivial 
class of languages considered is learnable... |Golfi7| 
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4.2 Angluin's Results 

The first example of non-trivial class of learnable grammars was discovered by Dana Angluin 
(see |Ang80| ). If Pat is defined like in example V^Al we can prove that the class of all 
pattern languages has finite elasticity and, therefore, it is learnable. Furthermore, such a 
learnable class of grammars was also the first example of an interesting class of grammars 
that cross-cuts Chomsky Hierarchy, therefore showing that Chomsky's is not but one of 
many meaningful possible classifications for formal grammars. 

4.3 Shinohara's Results 

Initial pessimism about effective usefulness of Gold's notion of identification in the limit was 
definitely abandoned after an impressive result by Shinohara who proves (see jShi90j 'l. that 
k-rigid context sensitive grammars (context-sensitive grammars over a finite alphabet E with 
at most k rules), have finite elasticity for any k. Since the universal membership problem 
for context-sensitive grammars is decidable, that class of grammars is learnable. This is a 
particular case of his more general result about finite elasticity for what he calls monotonia 
formal system. 

4.4 Kanazawa's Results 

Makoto Kanazawa in |Kan98| makes another decisive step toward bridging the existing 
gap between Formal Learning Theory and computational linguistics. Indeed, he gets some 
important results on the learnability for some non-trivial subclasses of Classical Categorial 
Grammars (also known as AB Grammars) . Analogously to what is done in |Shi9f)| he proves 
that as soon as we bound the maximum number of types a classical categorial grammar 
assigns to a word, we get subclasses which can be effectively learnable: in particular, he 
proves effective learnability for the class of k-valued Classical Categorial Grammars, both 
from structures and from strings. 

In the first case, each string of the language the learner is presented to comes with 
additional information about the underlying structure induced by the grammar formalism 
that generates the language. The availability of such additional information for each string 
is somewhat in contrast with Gold's model of learning and gives rise to weaker results. 
On the other hand, psychological plausibility of the process is preserved by the fact that 
such an underlying structure can be seen as some kind of semantic information that could 
be available to the child learning the language from the very early stages of her cognitive 
development. 

4.5 Our Results 

The present work pushes Kanazawa's results a little further in the direction of proving the 
effective learnability for more and more powerful and expressive classes of formal languages. 
In particular, we will be able to prove learnability for the class of Rigid Lambek Grammars 
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(see chapter El and to show an effective algorithm to learn them on the basis of a structured 
input. Much is left to be done along this direction of research, since even a formal theory 
for Rigid Lambek Grammars is still under-developed. However, our results confirm once 
again that initial pessimism toward this paradigm of learning was largely unjustified, and 
that even quite a complex and linguistically motivated formalism like Lambek Grammars 
can be learnt according to it. 
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5 Lambek Grammars 

In 1958 Joachim Lambek proposed (see |Lam58| l to extend the formaHsm of Classical Cate- 
gorial Grammars (sometimes referred to also as Basic Categorial Grammars or BCGs) by a 
deductive system to derive type-change rules. A BCG is basically as a finite relation between 
the finite set of symbols of the alphabet (usually referred to as words) and a finite set of 
types. Combinatory properties of each word are completely determined by the shape of its 
types, which can be combined according to a small set of rules, fixed once and for all BCGs. 
Lambek's proposal marked the irruption of logics into grammars: Lambek grammars come 
with a whole deductive system that allows the type of a symbol to be replaced with a weaker 

type. 

It was first realized by van Benthem (in |vB87| l that the proofs of these type changes 
principles carry important information about their semantic interpretation, following the 
Curry-Howard isomorphism. Thus, the notion of a proof theoretical grammar was proposed 
that replaces formal grammars (see fChoSll) with deductive systems and that includes a 
systematic semantics for natural languages based on the relationship between proof theory 
and type theory. Thus, rather than considering grammatical categories as unanalyzed prim- 
itives, they are taken to be formulas constructed from atoms and connectives, and rather 
than defining grammars with respect to rewrite rules, grammars are defined by the rules of 
inference governing the connectives used in the syntactic categories. 

Due to the renewed interest in categorial grammars in the field of computational lin- 
guistics, Lambek (Categorial) Grammars (LCGs) are currently considered as a promising 
formalism. They enjoy the relative simplicity of a tightly constrained formalism as that for 
BCGs, together with the linguistically attractive feature of full lexicalization. 

Besides, although Pentus proved (in IP^nQJj) that Lambek grammars generate exactly 
context-free (string) languages, in |Tie99j it has been shown that their strong generative 
capacity is greater than that of context-free grammars. These features make them an in- 
teresting subject for our inquiry about their properties with respect to Gold's Learnability 
Theory. 

5.1 Classical Categorial Grammars 

The main idea which Hes behind the theory of Categorial Grammars is to conceive a grammar 
instead as a set of rules which generate any string of the language, as a system which assigns 
to each symbol of the alphabet a set of types which can be combined according to a small 
set of rules, fixed for the whole class of Classical Categorial Grammars. 

A context-free grammar a la Chomsky is made of a set of rules that generate all the strings 
of a given language in a "top-down" fashion, starting from an initial symbol which identifies 
all the well-formed strings. On the contrary, a categorial grammar accepts a sequence of 
symbols of the alphabet as a well-formed string if and only if a sequence of types assigned to 
them reduces (in a "bottom-up" fashion) according to a fixed set of rules, to a distinguished 
type which designates well-formed strings. 
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Definition 5.1 (Classical Categorial Grammar) 

A Classical Categorial Grammar (henceforth CCG) is a quadruple {T,, Pr, s) , such that 

• is a finite set (the terminal symbols or vocabulary), 

• Pr is a finite set (the non-terminal symbols or atomic categories), 

• F is a function from T, to finite subsets of Tp, where Tp is the smallest set such that: 

1. Pr C Tp 

2. ifA,BE Tp, then {A/B), {A\B) e Tp 

If F{a) = {Ai, . . . , An } we usually write G : a i-^ Ai , . . . , A„ . 

• s G Pr is the distinguished atomic category 

In a CCG, combinatory properties are uniquely determined by their structure. There 
are only two modes of type combination: so-called (according to the notation introduced 
in |La,m58j and almost universally adopted) Backward Application: 

A,A\B => B 

and Forward Application: 

B/A,A^ B. 

A non-empty sequence of types ^i, . . . , A„ is said to derive a type B, that is 

Ai,...,An => B, 

if repeated applications of the rules of Backward and Forward application to the sequence 
Ai, . . . , An results in B. 

In order to define the language generated by a CCG we have to establish a criterion to 
identify a string belonging to that language. That's what is done by the following 

Definition 5.2 The binary relation 

=><ZTp* X Tp* 

is defined as follows. Let A^Be Tp, let a, /3 £ Tp* , 

a A A\B 13 =^ a B (3 
a B/A A (3 ^ a B (3 

The language generated by a CCG G is the set 

{ai • • ■ a„ e S* I for I < i < n, 3Ai, G : Ui ^ Ai, and Ai . . . An s} 

where ^ is the reflexive, transitive closure of . 
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Informally, we can say that a string of symbols belongs to the language generated by a CCG 
if there exists a derivation of the distinguished category s out of at least one sequence of 
types assigned by the grammar to the symbols of the string. 

Example 5.3 The following grammar generates the language {a"6" | n > 0}; 

s/B, 



a : 
b : 

Here is a derivation for a^b^ : 

s/B s/B s/B B s\B s\B 
s/B s/B B s\B 
s/B B 



B, s\B 

^ s/B s/B s s\B s\B 
s/B s s\B ^ 

=> s 



Weak generative capacity of CCGs was characterized by Gaifman (see |BH64| 1: 

Theorem 5.4 (Gaifman, 1964) The set of languages generated by CCGs coincides with 
the set of context-free languages. 

From the proof of Gaifman's theorem, we immediately obtain the following normal form 
theorem: 

Theorem 5.5 (Gaifman normal form) Every categorial grammar is equivalent to a cat- 
egorial grammar which assigns only categories of the form 



Example 5.6 A CCG 

fol 



A, A/B,{A/B)/C. 
to that in example in Gaifman normal form is the 



a 
b 



s/B, {s/B)/s 
B 



and here is a derivation for a? 



s/B B 



{s/B)/s 



s/B 



B 



is/B)/B 



s/B 



B 



In the previous example we make use for the first time of a "natural deduction" notation 
for derivations, that in the present work will substitute the cumbersome notation used in 
example 15.31 
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5.2 Extensions of Classical Categorial Grammars 

As stated in the previous section, CCG formalism comes with only two reduction rules which 
yield smaller types out of larger ones. Montague's work on semantics (see |Mon97j l led to 
the definition of two further "type-raising" rules, by which it is possible to construct new 
syntactic categories out of atomic ones. We can extend the definition of CCGs as presented 
in the previous section by adding to the former definition two new type change rules: 

aBf] ^ a{A/B)\Af3 
aBf3 aA/{B\A)P 

Other type-change rules that were proposed are the composition: 

A/B B/C C\B B\A 

A/C C\A 

and the Geach Rules: 

A/B B\A 

{A/C)/{B/C) {C\B)\{C\A) 

We can extend the formaHsm of CCG by adding to definition 15.21 anv type change rule 
we need to formalize specific phenomena in natural language. Such a rule-based approach 
was adopted by Steedman (see |Ste93| l who enriches classical categorial grammar formalism 
with a finite number of type-changes rules. On the other hand, as it will be made clear in 
the following section, Lambek's approach is a deductive one: he defines a calculus in which 
type changes rules spring out as a consequence of the operations performed on the types. 

One could ask why we should follow the deductive rather than the rule-based approach. 
To begin with, as proved in |Zie89| . Lambek Calculus is not finitely axiomatizahle, that 
is to say that adding a finite number of type-change rules to the formalism of CCG one 
cannot derive all the type change rules provable in the Lambek Calculus. Moreover, the two 
approaches are very different under a theoretical viewpoint. 

From a Hnguistic perspective, Steedman pointed out that there is no reason why we 
should stick to a deductive approach instead of to a rule based one: he underlines the 
importance of introducing ad hoc rules to formalize specific linguistic phenomena. Why 
should we subordinate the use of specific type change rules to their derivability in some 
calculus? 

One of the most compeUing reasons to do so is given by Moortgat (see |Moo97| l who 
stresses the systematicity of the relation between syntax and semantics provided in a de- 
ductive framework. Also, Lambek Calculus enjoys an important property: it is sound and 
complete with respect to free semigroup model, i.e. an interpretation with respect to formal 
languages . That is to say, rules that are not deducible in Lambek Calculus are not sound, 
and so they can be considered as linguistically implausible. 
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5.3 (Associative) Lambek Calculus 

Categorial grammars can be analyzed from a proof theoretical perspective by observing the 
close connection between the "slashes" of a categorial grammar and implication in intuition- 
istic logics. The rule that allows us to infer that if w is of type A/B and v is of type B, then 
wv is of type A, behaves like the modus ponens rule of inference in logic. On the basis of this 
similarity Lambek proposed an architecture for categorial grammars based on two levels: 

• a syntactic calculus, i.e. a deductive system in which statement of the form 

Ai,...,An ^ B, 

to be read "from the types Ai, . . . , An we can infer type J5" can be proved; 

• a categorial grammar as presented in definition [Ol wherein the relation ^ is changed 
to allow any type change rule that could be deduced at the previous level. 

In doing so, instead of adding a finite number of type change rules to our grammar, every 
type change rule that can be derived in the Lambek Calculus is added to the categorial 
grammar. 

The following formalizations for Lambek Calculus are presented according, respectively, 
to the formalism of sequent calculus and to the formalism of natural deduction. Note that in 
the present work we will use the expression Lambek Calculus to refer to product-free Lambek 
Calculus: indeed we will never make use of the product '■' (which corresponds to the tensor 
of linear logic) . 

Definition 5.7 The sequent calculus formalization of the Lambek calculus contains the ax- 
iom [ID] and the rules of inference [/R], [/L], [\R], [\L], and [Cut]: 

[ID] 



Ah A 



T,AhB ThA A,B,nhC 

[/R] [/L] 

ThB/A A,B/A,T,nh C 

A,r\-B T\-A A,B,nhC 

[\R] [\^] 

rhA\B A,T,A\B,nh c 

AhB r,B,UhA 
[Cut] 

r,A,nhA 

Note: in [/R] and [\R] there is a side condition stipulating that F 0. 

The side condition imposed for [/R] and [\R] rules formalizes the fact that in Lambek 
Calculus one is not allowed to cancel all the premises from the left-hand side of a derivation. 
Otherwise stated, in Lambek Calculus there are no deductions of the form 

h A. 
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Coherently with our interpretation of Lambek Calculus as a deductive system to derive 
the type of a sequence of symbols of the alphabet out of the types of each symbol, such a 
derivation makes no sense, since it would mean assigning a type to an empty sequence of 
words. 

Definition 5.8 The natural deduction formalization of the Lambek Calculus is defined as 
follows: 

A [ID] 



A/B B B B\A 

A A 
[B] [B] 



A/B B\A 

Note: in [/I] and rules the cancelled assumption is always, respectively, the rightmost 
and the leftmost uncancelled assumption, and there must be at least another uncancelled 

hypothesis. 

Both formalisms have advantages and disadvantages. However, due to the close connec- 
tion between natural deduction proofs and A-terms and because the tree-like structure of 
deductions resembles derivations trees of grammars, the natural deduction version will be 
the primary object of study in the present work. 

For later purposes we introduce here the notion of derivation in Lambek calculus that 
will be useful later for the definition of the structure of a sentence in a Lambek grammar. 
A derivation of B from Ai,.. . ,A„ is a certain kind of unary-binary branching tree that 
encodes a proof of ^i, . . . , ^„ h B. Each node of a derivation is labeled with a type, and 
each internal node has an additional label which, for Lambek grammars, is either /E. \E, /I, 
or \/ and that indicates which Lambek calculus rule is used at each step of a derivation. For 
each occurrence of an introduction rule there must be a corresponding previously unmarked 
leaf type A which must be marked as [^4] (that corresponds to "discharging" an assumption 
in natural deduction). 
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The set of derivations is inductively defined as follows: 
Definition 5.9 Let A,B eTp and T,Ae Tp+, 

• A (the tree consisting of a single node labeled by A) is a derivation of A from A. 

• "Backslash elimination". If 



r 




A 

is a derivation of A from T and 

A 




A\B 

is a derivation of A\B from A, then 

r A 




B 



is a derivation of B from T, A. 
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• "Backslash introduction". If 

A,r 




B 



is a derivation of B from {^,r}, then 

[A],r 




B 

M 
A\B 



is a derivation of A\B from T. The leaf labeled by [A] is called a discharged leaf. 
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• "Slash elimination". If 




B/A 



is a derivation of B/A from T and 




is a derivation of A from A, then 

r 

A 




is a derivation of B from T, A. 
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• "Slash introduction". If 

r,A 




B 



is a derivation of B from {F, A\ then 

r, [A] 




B 

/I 



B/A 

is a derivation of B/A from T. The leaf labeled by [A] is called a discharged leaf. 

Example 5.10 The following example is a derivation of x from y/{x\y) (which proves one 
of the two type-raising rules in Lambek Calculus): 

X [x\y] 




y 



y/(x\y) 



5.4 Non-associative Lambek Calculus 

Lambek Calculus, as defined in the previous section, is implicitly associative. In order to 
use Lambek calculus to describe some linguistic phenomena we have to forbid associativ- 
ity and so the hierarchical embedding of hypotheses is respected. Another Hnguistically 
attractive feature of non-associative Lambek calculus is that it provides useful logical to 
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support semantics, but at the same time it prohibits transitivity, that sometimes leads to 
overgeneration. 

Definition 5.11 The natural deduction formalization of the non-associative Lambek Calcu- 
lus (SND) has the following axioms and rules of inference, presented in the sequent format: 

A'r A 

VrA/B AhB rhS AhB\A 

[/E] —[\E] 

(r,A)h^ {T,A)\-A 

(T,B)hA (B,T)hA 

r h A/B r h B\A 

Note: in [/I] and there is a side condition stipulating that F 0. 

5.5 Normalization and Normal Forms 

As one can easily see, in Lambek Calculus there are infinitely many proofs for any deduction 
Ai,. .. ,An h B. Since, as it will be extensively explained in sectional proofs in Lambek 
Calculus play a decisive role in defining the notion of structure for a sentence generated 
by a Lambek grammar, such an arbitrary proliferation of proofs for deductions is quite 
undesirable. 

The following definition introduces a useful relation between proofs in Lambek Calculus 
that formalizes our idea of a "minimal" proof for any deduction. It provides two normaliza- 
tion schemes that can be appHed to a derivation to produce a "simpler" derivation of the 
same result. 

Definition 5.12 The relation >i between proofs in the natural deduction formalization of 
Lambek Calculus is defined in the following way: 

. [A] . 

: B >i A B : >i 



A A\B B/A A 
^ [\^] B ——^ UE] B 



[B] B\A ^ : A/B [B] ^ : 

A B\A A A/B 

M — 777^ — yi] 



B\A A/B 

The symbol > stands for reflexive and transitive closure of>i. Relation >i is usually defined 
as l3-r]- conversion, while > as P-rj -reduction. 
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The relation > satisfies the following properties (see |Wa,n93| . |R,oo91| l: 

Theorem 5.13 (Wansing, 1993) The relation > is confluent (in the Church-Rosser mean- 
ing), i.e. if Si > S2 and Si > S3, then there exists a S4, such that S2 > Si and 63 > S^. 

Theorem 5.14 (Roorda, 1991) The relation > is both weakly and strongly normalizing, 
that is, every proof can be reduced in normal form and every reduction terminates after at 
most a finite number of steps. 

Definition 5.15 (/3-?7-normal form) A proof tree for the Lambek Calculus is said to be in 
/3-77-normal form is none of its subtrees is of the form 

[B] [B] 



A A 



A/B B B B\A 

i/E] \ [\E] 



A A 

A/B [B] \B] B\A 

' [IE] ' [\E] 

A A 

7:-. [\^] 



A/B B\A 
5.6 Basic Facts about Lambek Calculus 

Let's summarize here some meaningful properties for Lambek calculus, which is: 

• intuitionistic: only one formula is allowed on the right-hand side of a deduction. This 
means there is neither involutive negation, nor disjunction; 

• linear: so-called structural rules of logics are not allowed: two equal hypotheses can't 
be considered as only one, and on the other hand we are not allowed to "duplicate" 
hypotheses at will. Lambek calculus is what we call a resource-aware logics, wherein 
hypotheses must be considered as consumable resources; 

• non-commutative: hypotheses don't commute among them, that is, the implicit oper- 
ator "•" in this calculus is not commutative. This is what makes possible the existence 
of the two "implications" (/ and \), the first one consuming its right argument, the 
second one its left argument. 

Since Lambek proved a cut-elimination theorem for his calculus (see |Lam58| l. among 
the many consequences of the normalization theorems there are the subformula property, 
that is: 
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Proposition 5.16 Every formula that occurs in a normal form natural deduction proof of 
cut-free sequent calculus proof is either a subformula of the (uncancelled) assumptions or of 
the conclusion; 

and decidability for Lambek calculus: 

Proposition 5.17 Derivability in the Lambek Calculus is decidable. 

In fact, given a sequent to prove in Lambek calculus, cut-elimination property authorizes us 
to look for a cut-free proof. But if the sequent comes from the application of a rule other 
that cut, this can't but be made in a finite number of different ways, and in any case we 
have to prove one or two smaller (i.e. with less symbols) sequents. This is enough to prove 
decidability for Lambek calculus. 

Theorem 15.141 states that any proof has a normal form and theorem 15.131 that this nor- 
mal form is unique. This doesn't mean that there is a unique normal form proof for any 
deduction. The following theorem by van Benthem sheds light on this point: 

Theorem 5.18 (van Benthem) For any sequent 

Ai,...,A„ h B 

there are only finitely many different normal form proofs in the Lambek Calculus. 

This is quite an unsatisfactory result: we still have a one-to-many correspondence be- 
tween a sequent and its normal proofs. This leads to what is generally known as the problem 
of spurious ambiguities for Lambek grammars. 

5.7 Lambek Grammars 

A Lambek grammar extends the traditional notion of categorial grammars as presented in 
section lO] by a whole deductive system in the following way: 

• a lexicon assigns to each word Wi a finite set of types 

F(mO = {ti,...,iMcp(rp); 

• the language generated by this fully lexicalized grammar is the set of all the sequences 
wi ■ ■ -Wn of words of the lexicon such that for each Wi there exists a type ti e F{wi) 
such that 

tlj ' • ' ^tfi l" s 

is provable in Lambek calculus. 
Formally: 
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Definition 5.19 (Lambek grammar) A Lambek grammar is a triple G = {Ti,s,F), such 
that 

• Ti is a finite set (the vocabulary), 

• s is the distinguished category (a propositional variable), 

• -F : S — > p{Tp) is a function which maps each symbol of the alphabet into the set if its 
types. If F{a) = {Ai, . . . , A„} we write G : a i-^ Ai, . . . , A^. 

For w G S*, w = ai • • • a„, we say that G accepts iv if there is a proof in Lambek calculus of 

Ai, . . . , An l~ s 

with G : tti \—> Ai for each i. 

The language generated by a Lambek grammar G is 

L(G) = {oi • • • a„ e S* I for 1 < i < n, 3Ai, G : A^ and Ai, . . . ,An \- s}. 

Example 5.20 Let S = {Mary, cooked, the, beans} be our alphabet and s our distin- 
guished category. Let's take F such that 

Mary : np 
cooked : (np\s)/np 
the : np/n 
beans : n 

Then Mary cooked the beans belongs to the language generated by this grammar, because in 
Lambek calculus we can prove: 

np, {np\s) I np, np/n,n\- s 
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Weak generative capacity for associative Lambek grammars was characterized (see jPgnHlj) 
by the following celebrated theorem, one of the finest and most recent achievements in this 
field: 

Theorem 5.21 (Pentus, 1997) The languages generated by associative Lambek grammars 
are exactly the context-free languages. 

Analogously, for non-associative Lambek grammars Buszkowski proved (see |Bus86| '): 

Theorem 5.22 (Buszkowski, 1986) The languages generated by non-associative Lambek 
grammars are exactly the context-free languages. 

6 Proofs as Grammatical Structures 

In this section we will introduce the notion of structure for a sentence generated by a Lambek 
grammar. On the basis of a recent work by Hans-Joerg Tiede (see |Tie99j l who proved some 
important theorems about the tree language of proof trees in Lambek calculus, we will adopt 
as the underlying structure of a sentence in a Lambek grammar a proof of its well-formedness 
in Lambek calculus. We will see in section El how this choice affects the process of learning 
a rigid Lambek grammar on the basis of structured positive data. 

6.1 (Partial) Parse Trees for Lambek Grammars 

Just as a derivation encodes a proof of Ai, . . . , An h B, the notion of parse tree introduced 
by the following definition encodes a proof of ai ■ • • a„ G L(G') where G is a Lambek grammar 
and ai, . . . , a„ are symbols of its alphabet. 

Definition 6.1 Let G = (S,s,i^) be a Lambek grammar, then 

• if V is a derivation of B from Ai, . . . , An, and ai, . . . , On are symbols of alphabet S 
such that G : tti 1-^ Ai for 1 < i < n, the result of attaching ai, . . . , a„, from left to 
right in this order, to the undischarged leaf nodes of T> is a partial parse tree of G. 



a, a„ 
A, ■" A„ 




B 



• A parse tree of G is a partial parse tree of G whose root node is labeled by the distin- 
guished category s. 
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If ai • ■ • a„ is the string of symbols attached to the leaf nodes of a partial parse tree V, 
ai • ■ • a„ is said to be the yield ofV. If a parse tree V of G yields ai ■ • ■ a„, then P is called 
a parse of ai ■ • ■ a„ in G. 

Example 6.2 Let E = {he, him, likes} be our alphabet and let G a Lambek grammar such 
that 

G : likes i-^ {np\s)/np, 
he s/(np\s), 
him {s/np)\s. 

Then the following is a parse for he likes him; 

likes 

(np\s)/np [np] 





6.2 Tree Languages and Automata 

In order to fully appreciate the peculiarity of Lambek grammars with respect to their strong 
generative capacity, we recall here some basic definitions about the notion of tree language 
as presented in |Tie99| . 

Definition 6.3 (Trees and tree languages) A tree is a term over a finite signature E 
containing function and constant symbols. The set of n-ary function symbols in E will be 
denoted by E„. The set of all terms over E will be denoted by Ts; a subset ofT^ is called a 
tree language or a forest. 



INRIA 



Learnability for Rigid Lambek Grammars 



37 



Definition 6.4 (Yield of a tree) The yield of a tree t is defined by 

yield{c) — c, for c G Eq 
yield{f{ti,...,tn)) = yield{ti), . . . ,yield{tn), forfe^n,n>0 

Thus, the yield of a tree is the string of symbols occurring as its leaves. 

Definition 6.5 (Root of a tree) The root of a tree t is defined by 

root{c) = c, for c G Sq 
root{f{ti,...,tn)) = /, for f eT.n,n> 0. 

In the following subsections three increasingly more powerful classes of tree languages 
are presented: local, regular and context-free tree languages. Note that even if the names 
for these classes of tree languages are the same as those for classes of string languages, their 
meaning is very different. 

6.2.1 Local Tree Languages 

We can think of a local tree language as a tree language whose membership problem can be 
decided by just looking at some very simple (local) properties of trees. A formalization of 
such an intuitive notion is given by the following definitions: 

Definition 6.6 (Fork of a tree) The fork of a tree t is defined by 

fork{c) = 0, for c G So 

n 

fork{f{ti,...,tn)) = {{f,root{ti),...,root{t^))}u[jfork{t,) 

i=l 

Definition 6.7 (Fork of a tree language) For a tree language L, we define 

fork{L) = U fork{t) 

Note that, since S is finite, fork{Tj:) is always finite. 

Definition 6.8 (Local tree language) A tree language L C is local if there are sets 
i? C I] and E C fork{T^), such that, for all t E T-^, t ^ L iff root (t) G R and fork{t) C E. 

Thatcher (see |Tha,fi7| l characterized the relation between local tree languages and the 
derivation trees of context-free string grammars by the following 

Theorem 6.9 (Thatcher, 1967) S is the set of derivation trees of some context-free string 
grammar iff S is local. 



RR n° 0123456789 



38 



Bonato 



6.2.2 Regular Tree Languages 

Among many different equivalent definitions for regular tree languages, we follow Tiede's 
approach in choosing the following one, based on finite tree automata. 

Definition 6.10 (Finite tree automaton) A finite tree automaton is a quadruple {T,,Q,qQ, 
such that 

• Yi is a finite signature, 

• Q is a finite set of unary states, 

• qo & Q is the start state, 

• A is a finite set of transition rules of the following type: 

q{c) — > c for c G So 
q{f{vi,...,v„)) f{qi{vi),...,qn{vn)) for f e'Sn, q,qi, . . . ,q„ G Q 

We can think of a finite tree automaton as a device which scans non-deterministically a tree 
from root to frontier. It accepts a tree if it succeeds in reading the whole tree, it rejects it 
otherwise. 

In order to define the notion of tree language accepted by a regular tree automaton we 
need to define the transition relation for finite tree automata. 

Definition 6.11 A context is a term over S U {a;} containing the zero-ary term x exactly 
once. 

Definition 6.12 Let M = (T,,Q,qQ, i^) he a finite tree automaton, the derivation relation 

=>mC Tque X Tque 

is defined by t =>m t' if for some context s and some ti, . . . ,tn G Ts, there is a rule in A 

q{f{vi,...,Vn)) f{qiivi),...,qnivn)) 

and 

t = s[x>-^q{f{ti,...,tn))] 
t' = s[x>-^ f{qi{ti),...,qn{tn))]- 

If we use to denote the reflexive, transitive closure of =>m, we say that a finite au- 
tomaton M accepts a term t gTj: if qo{t) t. The tree language accepted by a finite tree 
automaton M is 

e Te I qo{t) t}. 
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Definition 6.13 (Regular tree language) A tree language is regular if it is accepted by 
a finite tree automaton. 

The following theorem (see |Koz97| l defines the relation between local and regular tree 
languages: 

Theorem 6.14 Every local tree language is regular. 

while the following (see |iGS84j l establishes a relation between regular tree languages and 
context-free string languages: 

Theorem 6.15 The yield of any regular tree language is a context-free string language. 
6.2.3 Context-free Tree Languages 

The final step in the definition of more and more powerful tree language classes is made 
possible by introducing the notion of pushdown tree automaton. Again, we stick to Tiede's 
approach in choosing Guesserian's useful definition (see [GueSSj: 

Definition 6.16 (Pushdown tree automaton) A pushdown tree automaton is a system 
{Yi,T,Q, qq, Zq, A) , such that 

• is a finite signature (the input signature), 

• T is a finite signature ( the pushdown signature; we assume T, HT — ill), 

• Q is a finite set of binary states, 

• qa ^ Q is the start state, 

• Zq eT is the initial stack symbol, 

• A is a finite set of rules of the form 



q{f{vi, . . .,Vn),E{xi,. . . , 
q{v,E{xi, . . . , 



-^m )) 
'^m ) ) 

9(c) 



/(gi(wi,7l), ■ • ■ ,qn{Vn,ln)), 

q'{v,i). 



c 



with 



q,q ,qi 



, q-n G Q, 



c e So 

f^^r. 



n > 0, 



7', 71, . . . ,7n e 7ru{: 



Xi , . . } - 
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The transition relation for pushdown tree automata =^ can be defined straightforwardly 
as a generaHzation of definition 16.121 A term t is accepted by a pushdown automaton if 
qo{t, Zo) =J>* t, where =>* is the reflexive, transitive closure of 

Definition 6.17 (Context-free tree language) The language accepted by a pushdown 
tree automaton is called a context-free tree language. 

The relationship between regular and context-free tree languages is exemplified by the fol- 
lowing proposition: 

Proposition 6.18 The intersection of a regular and a context-free tree language is context- 
free. 

We know that the yield of a regular tree language is a context-free string language: there is 
a similar connection between the class of context-free tree languages and the class of indexed 
languages, as stated by the following 

Proposition 6.19 The yield of any context-free tree language is an indexed string language. 

Indexed languages have been proposed as an upper bound of the complexity of natural lan- 
guages, after it was shown that certain phenomena in natural languages cannot be described 
with context-free grammars (see |(;az88j V 

6.3 Proof Trees as Structures for Lambek Grammars 

In |Tie99| Hans-Joerg Tiede proposes, in contrast with a previous approach by Buszkowski, 
to take as the structure underlying a sentence generated by a Lambek grammar, one of the 
infinite proof trees of the deduction Ai, . . . , An \- s, where Ai, . . . , A„ is a sequence of types 
assigned by the grammar to each symbol, and s is the distinguished atomic category. 
Following Tiede's approach, we give the following 

Definition 6.20 (Proof tree) A proof tree for a Lambek grammar is a term over the 
signature S = {[/E], [\E], [//], [\/], [ID]} where 

• \ID\ is the 0-ary function symbol, 

• [/E] and [\E] are the binary function symbols, 

• [/ /] and [\/] are the unary function symbols. 

The terms over this signature represent proof trees that neither have information about the 
formulas for which they are a proof, nor about the strings that are generated by a gram- 
mar using this proof. These terms represent proofs unambiguously, since the assumption 
discharged by an introduction rule is univocally determined by the position of the corre- 
sponding [//] or [\/] function symbol in the proof tree. 
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Example 6.21 The term t = E\{[I D], [ID])) is an example of well-formed term over 

this signature. There 's no need for additional information about the discharged assumption 
since, as we can see from the tree-like representation of the term, the discharged assumption 
is unambiguously identified. 




\I 



The following terms are examples of not well-formed proof trees for the tree language 
generated by any Lambek grammar: 

• [//](y)). Since the major premise of the \E function symbol is something with 
a (. . .)\(. . .) shape, there's no way to reduct that term by a \E rule; 

• [/ E[{[\I]{x) , y) . Analogous to the previous situation; 

• [-^-O])) if the term x does not contain at least two uncancelled assumptions; 

• [//] ( [/ E] ( [ID] ,x)), if the term x does not contain at least two uncancelled assumptions . 

By taking a proof tree as the structure of a sentences generated by Lambek grammars, 
Tiede proved some important results about their strong generative capacity, that is, the 
set of the structures assigned by a grammar to the sentences it generates. Since strong 
generative capacity can provide a formal notion of the linguistic concept of structure of a 
sentence, this result justifies the current interest toward Lambek Grammars as a promising 
mathematical tool for linguistic purposes. 

Theorem 6.22 (Tiede, 1999) The set of well-formed proof trees of the Lambek Calculus 
is not regular. 

Theorem 6.23 (Tiede, 1999) The set of proof trees of the Lambek Calculus is a context- 
free tree language. 

These two theorems show that the language of proof trees is properly a context-free tree 
language. 

In particular, these theorems show that Lambek grammars are more powerful, with re- 
spect to strong generative capacity, than context-free grammars, whose structure language 
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is a local tree language as shown in theorem Ifi. 91 

We can easily introduce the notion of normal form proof tree by simply extending the 
notion of normal form proof as presented in definition 15.151 We can say that for normal 
form trees, in addition to the rules that prohibit terms of the form 

[\E]{x, i/lM), 
[/E]{[\I]{x),y), 

we have rules that prohibit terms of the form 

[\E]ix,[\I]{y)) 
[/E]i[/I]{x),y) 



and terms of the form 

[/m/E]ix,[ID])) 
\\m\E]{[ID],y)) 

which correspond to /3-redexes and ry-redexes, respectively, as one can easily see from defi- 
nition 

We can easily extend to the formaHsm of proof trees the "reduction rules" we've seen in 
section [5.51 to get a normal form proof tree out of a non-normal one. 



[ ] [ ] 
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As a corollary of theorem 16.221 Tiede proves that 

Theorem 6.24 (Tiede, 1999) The set of normal form proof trees of the Lambek Calculus 
is not regular, 

which, together with 

Theorem 6.25 The set of normal form proofs of the Lambek Calculus is a context-free tree 
language 

shows that the tree language of normal form proof trees of Lambek Calculus is properly a 
context-free tree language. 



6.4 Proof-tree Structures 

Given a Lambek grammar G, a proof-tree structure over its alphabet S is a unary-binary 
branching tree whose leaf nodes are labeled by either [ID] (these are called "discharged leaf 
nodes") or symbols of E and whose internal nodes are labeled by either \E, /E, \I, or //. 

The set of proof-tree structures over S is denoted E^. Often we will simply say 'structure' 
to mean proof-tree structure. A set of proof-tree structures over S is called a structure 
language over S. 

Example 6.26 The following is an example of a proof-tree structure for the sentence he 
likes him seen in example ifi.S^ 
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Let G be a Lambek grammar, and let 7^ be a partial parse tree of G. The result of 
stripping V of its type labels is a proof-tree structure, that is called the proof-tree structure 
of P. If T is the structure of a parse tree V, we say that "P is a parse of T. 

We say that a Lambek grammar G generates a structure T if and only if for some parse 
tree V oi G, T is the structure of V. The set of structures generated by G is called the 
(proof-tree) structure language of G and is denoted PL(G'). In order to distinguish L(G), 
the language of G, from PL(G), its structure language, we often call the former the string 
language of G. 

The yield of a proof-tree structure T is the string of symbols ai, . . . ,a„ labeHng the 
undischarged leaf nodes of T, from left to right in this order. The yield of T is denoted 
yield{T). Note that L(G) {yield{T) \ T e PL(G)}. 

6.5 Decidable and Undecidable Problems about 
Lambek Grammars 

Since, as stated in by theorem l5.17[ Lambek calculus is decidable, the universal membership 
problem "s e L(G)" is decidable for any sentence s and any Lambek grammar G. 

On the other hand, the questions "L(Gi) = L(G2)" and "L(Gi) C L(G2)" for arbitrary 
Lambek grammars Gi and G2 are undecidable, because the same questions are undecidable 
for context-free grammars and there exists an effective procedure for converting a context- 
free grammar G' to a Lambek grammar G such that L{G') — L{G). 

Given a proof-tree structure t the question "i £ PL(G)" is decidable. In fact, as shown 
by Tiede in 16.231 every proof tree language of a Lambek Grammar is a context-free tree 
language; and that problem is decidable for context-free tree languages (you just have to 
run the pushdown tree automata on t). 
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Unfortunately, the question "PL(Gi) C PL(G'2)" has been proved decidable only for 
Gi,G2 non-associative Lambek grammars. Whether it is decidable or not for (associative) 
Lambek grammars is still an open question and the subject of active research in this field. 

6.6 Substitutions 

In this section we introduce the notion of a Lambek grammar being a substitution instance 
of another. Besides, we define a notion of size of a Lambek grammar that will be decisive 
in our proof of learnability for Rigid Lambek Grammars presented in section 

First of all, let's define what we mean when we say that a Lambek grammar is subset of 
another one: 

Definition 6.27 Let Gi, G2 be Lambek grammars; we say that Gi C G2 if and only if for 
any a £ S such that Gi : a t-^ A we have also G2 ■ a A. 

Example 6.28 Let {Prancesca, loves, Paolo} C S and let 

Gi : Prancesca i— > np 

loves np\s 
G2 '■ Prancesca 1-^ np 

loves I— > np\s,np\{s/np) 
Paolo I— > np 

Obviously, Gi C G2 

Definition 6.29 A substitution is a function a : Var Tp that maps variables to types. 
We can extend it to a function from types to types by setting 

a{t) ^ t 
a{A/B) = a{A)la{B) 
a{A\B) = a{A)\o{B) 

for all A,B e Tp. 

We use the notation {xi ^ Ai,. . . ,Xn An} to denote the substitution a such that 
a{xi) — Ai, . . . , a{xn) — An and (j{y) — y for all other variables y. 

Example 6.30 Let a — {x ^ x\y, y ^ s, z i—>- s/{s/x)}. Then 

a{{s/x)\y) = is/ix\y))\t 

and 

a{{{s/x)\y)l{x/z)) = {{s / {x\y))\s) / {{x\y) / {s / {s / x))) . 
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The following definition introduce the notion of a Lambek grammar being a substitution 
instance of another: 

Definition 6.31 Let G — (S, s, F) be a Lambek grammar, and a a substitution. Then cr[G] 
denotes the grammar obtained by applying a in the type assignment of G, that is: 

(7[G] = (S,s,a-F) 

<j[G] is called a substitution instance of G. 

It easy to prove also for Lambek grammars this straightforward but important fact that 
was first proved for CCGs in |BP93 

Proposition 6.32 If a[Gi\ C G2, then the set of proof-tree structures generated by Gi is a 
subset of the set of proof-tree structures generated by G2, that is PL(Gi) C PL(G2). 

Proof. Suppose cr[Gi] C G2. Let T G PL(Gi) and let 7^ be a parse of T in Gi. Let (j[V] 
the result of replacing each type label AofVhy (7{A). Then it is easy to see that a[P] is a 
parse of T in G2. Therefore, T G PL(G2). 

Corollary 6.33 //(t[Gi] C G2, then L(Gi) C L(G2). 

Proof. Immediate from the previous proposition and the remark at the end of section 16.41 

A substitution that is a one-to-one function from Var to Var is called a variable renam- 
ing. If (7 is a variable renaming, then G and a[G] are called alphabetic variants. Obviously 
grammars that are alphabetic variants have exactly the same shape and are identical for all 
purposes. Therefore, grammars that are alphabetic variants are treated as identical. 

Proposition 6.34 Suppose (Ti[Gi] — G2 and (72 [G2] — Gi. Then Gi and G2 are alphabetic 
variants and thus are equal. 

Proof. For each symbol c G S, cti and 0-2 provide a one-to-one correspondence between 
{A \ Gi : c A} and {A | G2 : c i— s- A}. Indeed, if it didn't and, say, {(Ji{A) | Gi : c 1— > 
A} C {A I G2 : c 1— > A}, then cr2[G2] = o-2[o-i[Gi]] couldn't be equal to Gi, and Hkewise 
for (72 • Then, it is easy to see that ui | Var{Gi) is a one-to-one function from Var{Gi) 
onto yar(G2), and (72 T Var(G2) — (ci j V^ar(Gi))^^. One can extend cti t Var(Gi) to a 
variable renaming a. Then cr[Gi] — (7i[Gi] — G2. 

6.7 Grammars in Reduced Form 

Definition 6.35 A substitution a is said to be faithful to a grammar G if the following 
condition holds: 

for all c G dom{G), if Gi : A, Gi : c ^ B , and A^ B, then a{A) ^ cr{B). 
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Example 6.36 Let G be the following grammar 

G : Francesca i-^ x, 

dances i-^ x\s, y 
well y\{x\s). 

Let 

(71 = {yH^a;}, 
(T2 = {y t-^ a;\s}. 

Then ai is faithful to G, while (72 is not. 

Definition 6.37 Let \— be a binary relation on grammars such that Gi C G2 if and only if 
there exists a substitution a with the following properties: 

• a is faithful to Gi ; 

• <j[Gi\ C G2. 

From the definition above and nronosition 16.341 it's immediate to prove the following: 
Proposition 6.38 C is reflexive, transitive and antisymmetric. 
Definition 6.39 For any grammar G, define the size of G, size{G), as follows: 

size(G) = Y^ \A\, 

ceS G■.c^^A 

where, for each type A, \A\ is the number of symbol occurrences in A. 
Lemma 6.40 If Gi C G2, then size{Gi) < size{G2), 

Proof. For any type A and any substitution ct, |yl| < |(7(yl)|. Then the lemma is immediate 
from the definition of C. 

Corollary 6.41 For any grammar G, the set {G' \ G' C G} is finite. 

Proof By lemma iOTil {G' | G' □ G} C {G' | size{G') < size{G)}. The latter set must be fi- 
nite, because for any rt g N, there are only finitely many grammars G such that size{G) — n. 

If we write Gi C G2 to mean Gi C G2 and Gi 7^ G2 , we have 
Corollary 6.42 IZ is well-founded. 

Definition 6.43 A grammar G is said to be in reduced form if there is no G' such that 
G' nG andPL(G) =PL(G'). 
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7 Lambek Grammars as a Linguistic Tool 
7.1 Lambek Grammars and Syntax 

As explicitly stated in the original paper wherein Lambek laid the foundations of the Lambek 
Calculus, his aim was 

[...] to obtain an effective rule (or algorithm) for distinguishing sentences from 
nonsentences, which works not only for the formal languages of interest to the 
mathematical logician, but also for natural languages such as English, or at least 
for fragments of such languages. ( Jam58j) 

That's why, even if Lambek grammars can be simply considered as interesting mathe- 
matical objects, it will be useful to underline here some properties that make them also an 
interesting tool to formalize some phenomena in natural languages. 

The importance of Lambek's approach to grammatical reasoning lies in the development 
of a uniform deductive account of the composition of form and meaning in natural language: 
formal grammar is presented as a logic, that is a system to reason about structured linguistic 
structures. 

The basic idea underlying the notion of Categorial Grammar on which Lambek based 
his approach is that a grammar is a formal device to assign to each word (a symbol of 
the alphabet of the grammar) or expression (an ordered sequence of words) one or more 
syntactic types that describe their function. Types can be considered as a formalization of 
the linguistic notion of parts of speech. 

CCGs assign to each symbol a fixed set of types, and provide two composition rules to 
derive the type of a sequence of words out of the types of its components. Such a "fixed 
types" approach leads to some difficulties: to formalize some linguistic phenomena we should 
add further rules to the two elimination rules defined for CCGs as described in section 
In the following subsections we present some examples where the deductive approach of 
Lambek grammars leads to more an elegant and consistent formalization of such linguistic 
phenomena. 

In the following subsections we take s as the primitive type of well-formed sentences in 
our language and np as the primitive type for noun phrases (such as John, Mary, he). 

7.1.1 Transitive verbs 

Transitive verbs require a name both on their left and right hand sides, as it is apparent 
from the well-formedness of the following sentences. 

np {np\s)/np np 

John ( likes Mary) 

np np\{s/np) np 

(John likes ) Mary 
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Both parenthesizations lead to a derivation of s as type of the whole expression. This would 
mean that in an CCG we should assign to any transitive verb at least two distinct types: 
{np\s)/np and np\{s/np). 

On the contrary, in a Lambek grammar, since we can prove both 

{np\s)/np h np\{s/np) 

and 

np\{s/np) \- {np\s)/np 

we can simply assign to a transitive verb the type np\s/np without any further parenthe- 
sizations. 

7.1.2 Pronouns 

If we try to assign a proper type to the personal pronoun he we notice that its type is such 
that the following sentences are well-formed: 

np\s 

he works, 

np\s/np np 

he likes Jane 

We have two choices: either we give he the same type as a name (that is, np) or we give it 
the type s/{np\s). In the first case there is a problem: expressions like Jane likes he are 
considered as well-formed sentences. So, we assign to he the type s/{np\s). 

Analogously, since the personal pronoun him makes the following sentences well-formed: 

np np\s/np 

Jane likes him 

np np\s s\s/np 

Jane works for him, 

we assign to him the type {s/np)\s (and not type np, since expressions like him likes John 
would be well-formed). 

Since a pronoun is, according to its own definition, something that "stands for a noun", 
we wish that in our grammar each occurrence of a pronoun could be replaced by a name 
(while the converse is not always true): but this means that any name (say, John, of type 
np) should also be assigned the type of he and him, that is, respectively, type .s/(«.p\.s) and 
type {s/np)\s. In other words, we need something that accounts for a type-raising. But 
since in Lambek Calculus we can prove 

np h s/{np\s) 
np h {s/np)\s 

for any np and s, a Lambek grammar provides a very natural formalization of the relationship 
between names and pronouns: while a name can always be substituted to a pronoun in a 
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sentence (and the type-raising derivation guarantees that a name can always "behave Hke" 
a pronoun if we need it to) , the converse is not true (the converse of the type-raising proof 
doesn't hold in Lambek Calculus). The proof of the first deduction is reported in example 
15.101 as a derivation in a Lambek grammar. 

7.1.3 Adverbs 

If we look for the proper type for adverbs like here we can consider the well-formed sentence 
John works here. We can choose between two possible parenthesizations here, that is: 

np np\s 

(John works ) here 

np np\s 

John (works here) 

The first one suggests for here the type s\s, while the second one the type {np\s)\{np\s) . 
The good news is that, while in a CCG we should assign each adverb at least two different 
types, in a Lambek grammar we can prove that 

s\s h {np\s)\{np\s) 

that is to say, in Lambek grammars any adverbial expression of type s\s has also type 
{np\s)\{np\s) . More generally, we can show that in Lambek Calculus 

x\y h {z\x)\{y\x) 
x/y h {x/z)/{y/z). 

7.1.4 Hypothetical reasoning 

In the following example, sentences s, noun phrases np, common nouns n, and propositions 
phrases pp are taken to be "complete expressions", whereas the verb dances, the determiner 
the and the preposition with are categorized as incomplete with respect to these complete 
phrases. 
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Example 7.1 Here is the derivation for the sentence Francesca dances with the boy. 



the boy 

np/n n 



with 




pp/np np 



dances /e 

((np/s)/pp pp 



Francesca /e 

np np\s 




\E 
S 



This is an example of grammatical reasoning where, on the basis of the types we assigned 
to each word, we infer the well-formedness of a sequence of words. On the other hand we 
can assume a different perspective: knowing that a sentence is well-formed, what can be 
said about the type of its components? In the words of Lambek: "Given the information 
about the categorization of a composite structure, what conclusions could be draw about 
the categorization of its parts?" ( |La,m58| V That's where the following inference patterns 
come into play: 

from T,Bh A, infer T h A/B, 
from B, r h A, infer T h B\A 

which gives a linguistic interpretation of the role of the "introduction" rules. That's what is 
done in the following derivation which allows us to infer that the expression the boy Francesca 
dances with is of type np: 
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with 

pp/np [np] 




\E 

s 



whom /I 

(n\n)/(s/np) s/np 

boy m 

n n\n 




m 
np 



Since the relative pronoun whom (of type {n\n) / {s / np)) wants to enter into composition 
on its right with the relative clause body, we'd like to assign type s/np to the latter. In 
order to show that Francesco dances with is indeed of type s/np, we make a hypothetical 
assumption and suppose to have a "ghost word" of type np on its right. It's easy to derive 
the category s for the sentence Francesca dances with np. By withdrawing the hypothetical 
np assumption, we conclude that Francesca dances with has type s/np. 

We can say that the cancelled hypothesis is the analogous of a "trace" a la Chomsky 
moving whom before Francesca. 

7.1.5 Transitivity 

In the framework of CCGs a difficulty arises when we try to show the well-formedness of 

s/(np\s) np\s / np (s / np)\s 

he likes him 

so some authors proposed to introduce two new rules, which are often referred to as 'tran- 
sitivity rules': 

{x/y){y/z) "> x/z, 
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It's easy to show that such rules are derivable in Lambek Calculus, as we can easily see from 
the following proof tree: 




1/ 

x/z 



7.2 Lambek Grammars and Montague Semantics 

From a linguistic point of view, one of the main reasons of interest in Lambek grammars lies 
in the natural interface that proof-tree structures provide for Montague- like semantics. Just 
like Curry-Howard isomorphism shows that simply typed A-terms can be seen as proofs in 
intuitionistic logics, and vice- versa, syntactical analysis of a sentence in a Lambek grammar 
is a proof in Lambek calculus, which is naturally embedded into intuitionistic logics. Indeed, 
if we read B/A and A\B like the intuitionistic implication A ^ B, every rule in Lambek 
calculus is a rule of intuitionistic logics. 

In order to fully appreciate this relation between syntax and semantics which is par- 
ticularly strong for Lambek grammars, we define a morphism between syntactic types and 
semantic types: the latter are formulas of a minimal logics (where the only allowed connector 
is that is, intuitionistic implication) built on the two types e (entity) and t (truth values). 

(Syntactic type)* = Semantic type 

s* = t (a sentence is a proposition) 

sn* = e (a nominal sintagma denotes an entity) 

n* = e ^ t {a noun is a subset of entities) 

{A\By = {B/A)* =A*^B* extends (_)* to every types. 

The lexicon associates also to every word w a A-term Tk for every syntactic type tk G 
C{w), such that the type of is precisely i^, the semantic type corresponding to that syntac- 
tic type. We introduce some constants for representing logical operations of quantification, 
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conjunction etc: 



Constant 


Type 


3 


(e 


-^t)- 


^ t 


V 


(e 




-> t 


A 


t - 




t) 


V 


t - 




t) 


D 


t - 




t) 



Let the following be given: 

• a syntactical analysis of wi . . . w„ in Lambek calculus, that is to say, a derivation V 
of ii, . . . , i„ h s and 

• the semantics for every word wi , . . . , w„ , that is to say, A-terms : t* , 

then we get the semantics of the sentence by simply applying the following algorithm: 

• Substitute in V every syntactic type with its corresponding semantic image; since 
intuitionistic logics is an extension of Lambek calculus, we get a derivation V* into 
intuitionistic logic of i*, . . . , t* h t = s*; 

• this derivation in intuitionistic logic due to Curry-Howard isomorphism can be seen as 
a simply typed A-term T>\ , containing a free variable Xi of type t* for every word Wi ; 

• in replace each variable Xi with A-term r^, equally typed with t*; 

• reduce the A-term resulting at the end of the previous step, and we get the semantic 
representation of the analyzed sentence. 

Let's consider the following example (taken from |T{et96| l: 



word 


Syntactic type t 
Semantic type t* 

Semantic representation: a A-term of type t* 


some 


(s / {sn\s)) / n 

\P ■.e->t \Q:e-> t{3{\x : e{A{Px){Qx)))) 


sentences 


n 

e t 

Xx : e(sentence x) 




talkabout 


sn\{s/sn) 

Xx : e Xy : e((talkabout x)y) 


themselves 


{{sn\s) / sn)\{sn\s) 

(e -> (e ^ t)) -^{e^t) 

AP : e (e -> t)Xx : e{{Px)x) 
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First of all, we'll prove that Some sentences talk about themselves is a well formed- 
sentence, that is, it belongs to the language generated by the lexicon at issue. This means 
building a natural deduction of: 

{s/{sn\s))/n, n, sn\{s/sn), {{sn\s) / sn)\{sn\s) h s. 

If we indicate with S, N, T, M the left-hand side of syntactic types we get 

S h {s/{sn\s))/n N \- n T h {sn\s)/sn M h {{sn\s) / sn)\{sn\s) 

[/£^] \\E] 

S,Nhs/(sn\s) T,M\-sn\s 

■ — ■ ^ \\E] 

S,N,T,M\-s 

By applying the isomorphism between syntactic and semantic types, wc get the following 
intuitionistic proof, where S* , N* ,T* , M* are the abbreviations for semantic types associated 
to S,N,T,M: 

S*h(e^t)^(e^t)^t N*\-e^t T*\-e^e^t M* \- (e ^ e ^ t) ^ e ^ t 
^ ^ E\ ^ E\ 

S*,N* h {e^t) ^t T*,M*he^t 

S*,N*,T*,M* h t 

The A-term coding this proof is simply {{sn){tm)) of type t, where s, n, t, m are variables 
of types respectively S*, N* , T* , M* . 

By replacing these variables with A-terms of the same types associated by the lexicon to 
the words, we get the following A-term of type t: 

((AP XQ (3 {Xx{^{P x){Q x))))){\x (sentence x))) 
{{\P Xx {{P x)x)){Xx Xy ((talkabout x)y))) 

i/3 

{XQ (3(Aa;(A(sentence x){Q a;)))))(Aa;((talkabout x)x)) 

IP 

(3(Aa;(A(sentence x) ((talkabout x)x)))) 

If we recall that the x in this last term is of type e, the latter reduced term represents 
the following formula in predicate calculus: 

3x : e(sentence (a;) A talkabout (x, a;)) 

which is the semantic representation of the previously analyzed sentence. 



E] 



RR n° 0123456789 



56 



Bonato 



8 Rigid Lambek Grammars 

In the present section we introduce the notion of rigid Lambek grammar (often referred 
to as RLG), whose learnabiHty properties will be the subject of our inquiry in section |2l 
Basic notions and results presented here are almost trivial extensions of what has already 
been done for rigid CCGs (see |Kan98| l. since a specific a specific theory for rigid Lambek 
grammars is still missing. 

8.1 Rigid and k- Valued Lambek Grammars 

A rigid Lambek grammar is a triple G — (S, s, F), where S and s are defined like in definition 
I5.19L while F : S ^ Tp is a partial function that assigns to each symbol of the alphabet at 
most one type. We can easily generalize the notion of rigid Lambek grammar to the notion 
of k-valued Lambek grammar by a function F that assigns to each symbol of the alphabet 
at most k types. Formally, F : E Uf=i Tp'^. 

Let an alphabet S be given. We call Grigid the class of rigid Lambek grammars over S, 
and Gk-vaiued the class of k-valued Lambek grammars over S. 
Let's define two classes of proof-tree structures: 



Members oiVCrigid are called rigid (proof-tree) structure languages, and members oiV Ck-vaiued 
are called k-valued (proof-tree) structure languages. 

Let's define two classes of strings: 



Members of Crigid are called rigid (string) languages, and members of Ck~vaiued are called 
k-valued (string) languages. 

Example 8.1 Let {well, Francesca, dances} C S and let Gi, G2 be the following Lambek 
grammars: 




{PL(G) I G e gr^g^d}, 
{PL{G) I G e Gk-valued}. 



■k — valued 



■rigid 



{L(G) I G e Grigid}, 
{L(G) I G G Gk-valued}- 



Gi : Francesca 1-^ x, 

dances x\s, y, 

well I— > y\{x\s), 

G2 ■ Prancesca 1-^ x, 

dances 1-^ x\s, 

well I— > {x\s)\{x\s). 
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Then G2 is a rigid grammar, while Gi is not. Gi is a 2-valued grammar. 

Definition 8.2 Any type A can he written uniquely in the following form: 

{...{{p\A^)\A2)\...)\Ar. 

where B\C stands for either B/C or C\B and p £ Pr. For < i < n, we call the subtype 
(. . . . . .)\Ai of A a head subtype of A. p is the head of A and is denoted head{A). 

Ai 's are called argument subtypes of A. The number n is called the arity of A. 

The following propositions are almost trivial extensions to rigid Lambek grammars of 
analogous results proved by Kanazawa for CCGs in jKanQ^. However, they deserve some 
attention since they can provide a first superficial insight about properties of RLGs. 

First of all we prove a hierarchy theorem about strong generative capacity of k-valued 
Lambek grammars. 

Proposition 8.3 Let a G S. For each i > 1, let Ti be the following proof-tree structure: 




Then for each k > 1, 

— valued — valued • 

Thus, for each fc G N, VCk-valued C V Ck+l-valued ■ 

Proof. (See [KanflSj ') Let Gk be the following k+l-valued grammar: 

Gfe : a 1-^ X, 

s/x, 

{s/x)/x, 




k times 



Then one can easily verify that {Ti, . . . , T^} C PL(Gfe). 
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Let G be a grammar such that {Ti, . . . ,Tk} C PL(G): we wih show that G is at least 
k+l-valued. 

Let Pi be a parse of Ti in G for 1 < i < k. Then the leftmost leaf of Vi is the ultimate 
functor of Pi, and if we call Ai the type labeling it, we can easily verify that the its arity 
must be exactly i. Thus, i ^ j implies Ai ^ Aj. 

We show that there is at least one type B such that G : B and B ^ {^i, . . . , Aj.}. 
Since the relation "is an argument subtype of" is well-founded, there is at least one i such 
that the argument subtypes of Ai are not in {Ai, . . . ^A^.}. But in order to produce Pi, any 
argument subtype of Ai must be a type assigned to a by G. Therefore G must be at least 
k+l-valued. 

The proof of nronosition 18.31 shows 
Corollary 8.4 There is no Lambek grammar G such that PL(G) = S^. 

Lemma 8.5 Let G he a rigid Lambek grammar. Then for each proof-tree structure T, there 
is at most one partial parse tree P such that T is the structure of P. 

Proof. By induction on the construction of T. 

Induction basis. T — c £Y^. Any partial parse tree P whose structure is T is a height 
tree whose only node is labeled by the symbol c and a type A such that G : c i-^ A. Since 
G is rigid, there is at most one such type A. Then P, if it exists, is unique. 

Induction step. There are 4 cases to consider: 
1. T is the following proof-tree structure: 




Then any partial parse tree of G whose structure is T has the form where Pi and P2 




A A\B 




B 



are partial parse trees of G whose structures are Ti and T2, respectively. By induction 
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hypothesis, Pi and V2 are unique. This means that the type label B is also uniquely 
determined, so P is also unique. 

2. Exactly like Case 1, with /E in place of \£'. 

3. T is the following proof-tree structure: 




\I 



Then any partial parse tree of G whose structure is T has the form where Vi is a 



[A],r 




B 



M 
A\B 

partial parse tree of G whose structure is Ti. By induction hypothesis, Vi is unique. 
This means the the type label A\B is uniquely determined, so V is also unique. 

4. Exactly like Case 3, with //in place of \/. 

Corollary 8.6 // G is a rigid Lambek grammar, each proof-tree structure T G PL(G) has a 
unique parse. 

Note that last corollary doesn't state that if G is rigid, then each string s S L(G) has a 
unique parse: in general for each sentence there are infinitely many proof trees, as extensively 
shown in sectional 

Lemma 8.7 Let G be a rigid Lambek grammar. Then for each incomplete proof -tree struc- 
ture T, there is at most one incomplete parse tree V of G such that T is the structure of 
V. 

Proof. See |Kan98j trivially extended to Lambek grammars. 
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8.2 Most General Unifiers and U Operator 

Unification plays a crucial role in automated theorem proving in classical first-order logic and 
its extensions (see, for example, ^lt2£| for an exposition of its use in first-order logic). Since 
types are just a special kind of terms, the notion of unification applies straightforwardly to 
types. 

Definition 8.8 Let A and B be types. A substitution a is a unifier of A and B if (7{A) = 
iy{B). A unifier a is a most general unifier of A and B, if for any other unifier t of A and 
B, there exists a substitution rj, such that t = a o r], i.e. t{C) — rj{a{C)), for C — A or 
C = B. 

A substitution a is said to unify a set A of types if for all Ai, A2 G A, a{Ai) — a{A2). We 
say that a unifies a family of sets of types, if a unifies each set in the family. 
A most general unifier is unique up to 'renaming of variables'. 

Example 8.9 Let A consist of the following sets: 

A\ = {xxjxi^xzlx/^}, 
A2 = {x^\{x:i\t}, 
A^ = {xi\t,x^}. 

Then the most general unifier of A is: 

a = {2:3 ^ xi,a:4 1-^ X2,x;i ^ xi\t}. 

There are many different efficient algorithms for unification, which decide whether a finite 
set of types has a unifier and, if it does, compute a most general unifier for it. For illustration 
purposes, we present here a non-deterministic version of an unification algorithm. 

Our algorithm uses the notion of disagreement pair. The easiest way to define disagree- 
ment pair is to consider the types to be tree-like: 

Definition 8.10 Let A and B be two types. A disagreement pair for A and B is a pair of 
subterms of A and B, A' , B' , such that A' ^ B' and the path from the root of A to the root 
of A' is equal to the path from the root of B to the root of B' . 

The following, non-deterministic version of the unification algorithm is taken from |Fit96j : 

Unification Algorithm. 

• input: two types A and B; 

• output: a most general unifier a of A, B, if it exists, or a correct statement that A 
and B are not unifiable. 
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Let cr e 

While a{A) ^ a{B) do 
begin 

choose a disagreement pair A', B' for cf{A), a{B); 

if neither A' nor B' is a variable, then FAIL; 

let X be whichever of A', B' is a variable (if both are, choose one) 

and let C be the other one of A', B' 

if X occurs in C, then FAIL; 

let a := a o {x C}; 

end 

The previous algorithm present one of many efficient algorithms for unification, so we 
the following is a well-defined notion: 

Definition 8.11 We define a computable partial function m,gu that maps a finite family A 
of finite sets of types to a most general unifier mgu{A), if A is unifiable. 

The set Qrigid of all rigid Lambek grammars is partially ordered by E- 

Definition 8.12 Let Q C Qrigid, and let G e Q .Then G is called an upper bound of Q if 
for every G' G 5, G" C G. 

We introduce here a new operator among rigid grammars that will be used to prove an 
interesting property for our learning algorithm at the end of the fifth chapter. 

Definition 8.13 Let G\ and G2 be rigid Lambek grammars. We can assume that Gi and 
G2 have no common variables (if they do, we can always choose a suitable alphabetic variant 
of one of them such that Var{Gi) Pi Var{G2) = %). Let 



^ = {{A I Gi U G2 : c ^ 



A] \c & dom{Gi U G2)} 




GiUG2 = ct[GiUG2]. 



// A is not unifiable, then G\ U G2 is undefined. 



Example 8.14 Let Gi and G2 be the following rigid Lambek grammars: 



Gi : a I— » s/x, 

6 I— > x, 
G2 : 6 1-^ y\s, 



c ^ y. 
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Then 



G1UG2 - a 
b 



s/{y\s), 

y\s, 

y- 



Obviously, from definition 18. l^^L we have 
Lemma 8.15 // d U G2 exists, then d □ d U d and d C Gi U d- 

Proposition 8.16 (Kanazawa, 1998) Let Gi,G2 G Grigid- If {Gi,G2} has un upper 
hound, then Gi U G2 exists and it's the least upper bound of {Gi, G2}. 

Proof (See |KaTig8|l. 
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9 Learning Rigid Lambek Grammars from Structures 

In the present chapter we will explore a model of learning for Rigid Lambek Grammars 
based on positive structured data. In addition to the standard model where sentences are 
presented to the learner as flat sequences of words, in this somewhat enriched model, strings 
come with additional information about their "deep structure". Following the approach 
sketched in sectionEi largely indebted with Tiede's study on proof trees in Lambek calculus 
as grammatical structures for Lambek grammars (see |Tie99| V in our model each sentence 
comes to the learner with a structure in the form of a proof tree structure as extensively 
described in sectional 

Formally, given a flnite alphabet S, we will present a learning algorithm for the grammar 
system {Qrigid, , PL): that is to say, samples to which the learner is exposed to are proof- 
tree structures over the alphabet S, and guesses are made about the set of rigid Lambek 
grammars that can generate such a set of structures. 

We follow the advice of Kanazawa (see [KanDS]) who underlines how such an approach, 
which turns out to be quite logically independent from an approach based on flat strings of 
words, seems to make the task of learning easier but doesn't trivialize it. If, on one hand, 
in the process of learning from structures the learner is provided with more information, on 
the other hand the criterion for successful learning is stricter. It is not sufficient that the 
string language of G contains exactly the yields of the structures in the input sequence, the 
learning function is required to converge to a grammar G that generates all the grammatical 
structures which appear in the input sequence. We could say that the learning function 
must converge to a grammar that is both weakly and strongly equivalent to the grammar 
that generated the input samples. 

Clearly, from a psycholinguistic point of view, both learning from flat strings and from 
proof tree structures are quite unrealistic models of first language acquisition by human 
beings. In the first case, experimental evidences (see |Pin94j 'l show that children can't 
acquire a language simply by passively listening to flat strings of words. First of all, we can 
think that prosody (or punctuation, in written text) can provide "structural" information to 
the children on the syntactic bracketing of the sentences she is exposed to (although they 
do not always coincide) and it is known that prosody is needed to learn a language for a 
child. Furthermore, another interesting evidence of the fact that a child needs something 
more to learn her mother tongue is given by the fact that no children can improve their 
grammatical skills during the early stages of their language acquisition process by watching 
TV: it seems very Hkely they need "richer data" than simple sentences uttered by an adult. 
Some researchers (see |Tel99| l hypothesize this additional information comes to the children 
as the semantic content of the flrst sentences she is exposed to, whose she could have a flrst, 
primitive grasp through flrst sensory- motor experiences. 

On the other hand, it is also highly unlikely that a child can have access to something 
Hke a proof tree structure of the sentence she is exposed to. Our befief is that a good formal 
model for the process of learning should rely on something "halfway" between flat strings of 
words and highly structured and complete information coming from the proof tree structure 
of the sentence. However, since, as we've already seen in section E21 proof tree structures 
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provide a very natural support for a Montague-like semantics, we think that our model 
for learning a rigid Lambek grammar from structured data represents a first, simple but 
meaningful approximation of a more plausible model of learning. 

In any case, even though in most of real-world applications only unstructured data are 
available, we are often interested not only in the sentences that a grammar derives, but 
also in derivation strings that grammar assigns to sentences. That is, we generally want a 
grammar that makes structural sense. 

9.1 Grammatical Inference as Unification 

We set our inquiry over the learnability for rigid Lambek grammars in the more general 
logical framework of the Theory of Unification. We will stick to the approach described 
in |Nic99| based on the attempt to reduce the process of inferring a categorial grammar 
to the problem of unifying a set of terms. This approach establishes a fruitful connection 
between Inductive Logic Programming techniques and the field of Grammatical Inference, a 
connection that has already been proved successful in devising efficient algorithms to infer 
k- valued CCGs from positive structured data (see |Ka,n98| V Our aim is to exploit as much as 
possible what has already been done in this direction by exploring the possibility of adapting 
existing algorithms for CCGs to rigid Lambek grammars. 

9.2 Argument Nodes and Typing Algorithm 

Our learning algorithm is based on a process of labeHng for the nodes of a set of proof tree 
structures. We introduce here the notion of argument node for a normal form proof tree. We 
will be a bit sloppy in defining such a notion, and sometimes we will use the same notation 
to indicate a node and the type it's labeled by, when this doesn't engender confusion, and 
much will be left to the graphical" interpretation of trees and their nodes. However, we can 
always think of a node as a De Bruijn-Hke object (see |d 6721 1 without substantially affecting 
the meaning of what will be proved. 

Definition 9.1 Let P be a normal form partial parse tree. Let's define inductively the set 
Arg{V) of argument nodes ofV. There are three cases to consider: 

• V is a single node labeled by a type x, which is the only member of Arg{V). 

• V looks like one of the following 



r 



A 



r 



A 




A 



A\B 



B/A 



A 





B 



B 
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then in the first case 



Arg{V) = {Root{V)} U Arg{Vi) U Arg{r2) - {Root{P2)}, 



and in the second case 



Arg{V) = {Root{V)} U Arg{Vi) U Arg{r2) - {Root{Pi)}. 



• V looks like one of the following 



[A],r 



r, [A] 

B 




B 



\i 



/I 



A\B 



B/A 



then Arg[V) = Arg{Vi). 

The following proposition justifies our interest for argument nodes for a normal form 
proof tree structure: 

Proposition 9.2 Lett he a well formed normal form proof tree structure. If each argument 
node is labeled, then any other node in t can be labeled with one and only one type. 

Proof. We prove that, once argument nodes are labeled, any other node can be labeled, by 
providing a typing algorithm; uniqueness of typing follows from the rules applied. 

By induction on the height h of t: 

Induction Basis. There are two cases to consider: 

1. h — Q. Trivially, by definition 19.11 i is a single argument node, the result of the 
application of a single axiom rule [ID] and by definition it's already typed. 

2. h = 1. Then t must be the result of a single application of a [/E] or [\E] rule. By 
hypothesis and definition l9.ll its two argument nodes are labeled with, say, xi and X2, 
and the remaining node must be labeled according to one of the following rules: 
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Induction Step. Let i be a normal form proof tree structure of height h > 1. There are 
3 cases to consider: 

1. t = \E{ti,t2). Since, by hypothesis, each node in Arg{t) — {Root{t)} U Arg{ti) U 
Arg{t2) — {Root{t2)}, is labeled, then also Root{t) is labeled with, say, X2- For the 
same reason, any node of Arg{ti) is labeled, too, and so, by induction hypothesis, ti 
is fully (and uniquely) labeled. In particular its root is labeled with, say, xi. Since t 
is well formed, t2 cannot be the result of the application of a [//] rule, and since t is 
normal, t2 cannot be the result of the application of a [\/] rule, so its root node is an 
argument node of its, too. By hypothesis, each node in Arg{t2) — {Root{t2)} has a 
type, so we can apply the following rule: 




and t2 has all of its argument nodes (uniquely) labeled. So, by induction hypothesis, 
its fully and uniquely labeled, and so is t. 

2. t = /E{ti,t2). Analogous to case 1. 

3. t = \I{ti) OT t = /I[ti). By definition, Arg[t) — Arg{ti), then by hypothesis, any ar- 
gument node in ti is labeled. Then, by induction hypothesis, ti is fully (and uniquely) 
labeled, and since t is well-formed, there must be at least two undischarged leaves in 
ti. So t can be fully labeled according, respectively, to the following rules: 
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\I \I /I /I 



where X2 labels, respectively, the leftmost and the rightmost undischarged leaf. 

The proof of the previous proposition has implicitly defined an algorithm for labeling in 
the most general way the nodes of a normal form proof tree structure. 

Definition 9.3 A principal parse of a proof tree structure t is a partial parse tree T of t, 
such that for any other partial parse tree T' of t, there exists a substitution a such that, if 
a node oft is labeled by type A in T, it's labeled by cr{A) in T' . 

From the proof of DroDosition l9.2l it's easy to devise an algorithm to get a principal parse 
for any well formed normal form proof tree structure. 

Principal Parse Algorithm 

• Input: a well formed normal form proof tree structure t] 

• Output: a principal parse T of t in a Lambek grammar G. 
Step 1. Label with distinct variables each argument node in t; 

Step 2. Compute the types for the remaining nodes according to the rules described in 
the proof of proposition 19.21 

Obviously, this algorithm always terminates. If T is the resulting parse, we can easily 
prove it's principal. If T' is another parse for t, let's define a substitution a in the following 
way: for each variable x € Var{G), find the (unique, for construction) node in T labeled 
by X, and let cr(x) be the type labeling the same node in T'. By induction on A G Tp{G) 
(where Tp{G) is the set of all subtypes appearing in a Lambek Grammar G), we prove that 

if A labels a node of T, a (A) labels the corresponding node of T'. 

Induction Basis. If A G Var, this holds by definition. 

Induction Step. Let A ~ B\C labels a node of T. Then the relevant part of T must 
look like one of the following cases: 



RR n° 0123456789 



First case: 




By induction hypothesis, the corresponding part of T' looks Uke 




Then A' = (j{B)\a{C) = <j{B\C) = a{A). 
Second case: 

[B] 




By induction hypothesis, the corresponding part of T' looks like 




Then A' = cr{B)\a{C) ^ cj{B\C) = cj{A). 
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The case A — C/B is entirely similar, thus completing the induction. 
It follows that if a node of T is labeled by A, then the corresponding node of T' is labeled 
by (j{A). That is to say, with a small abuse of notation, T' = cr(T). 



9.3 RLG Algorithm 

Our algorithm (called RLG from Rigid Lambek Grammar) takes as its input a finite set D 
of proof tree structures over a finite alphabet S and returns a rigid Lambek grammar G over 
the same alphabet whose structure language contains (properly) D, if it exists; a correct 
statement that there's no such a rigid Lambek grammar otherwise. 

Our algorithm is based on the type algorithm described in section l?0 and on the unifi- 
cation algorithm described in section |H21 

RLG Algorithm. 

• input: a finite set D of proof tree structures. 

• output: a rigid Lambek grammar G such that D C PL(G), if there is one. 
We illustrate the algorithm using the following example: 

/ a girl loves [ ] 



D= 
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Step 1. Normalize all the proof tree structures in D, if they are not normal, according 
to the rules described in section lOl 

Step 2. Assign a type to each node of the structure in D as follows: 

1. Assign s to each root node. 

2. Assign distinct variables to the argument nodes. 




S 



3. Compute types for the remaining nodes according to the rules described in proposition 

m 



a girl loves 

x/x, X, (x,\xj/x, [X,] 




a girl 



loves 

(x^\s)/x.j 



John 

' Xr, 



xAs 



^ him 

x/x, fx/xsjlc, 



s 



passionately 

x,\s 



s 
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Step 3. Collect the types assigned to the leaf nodes into a grammar GF[D) called the 
general form induced by D. In general, GF{D) : c ^ A li and only if the previous step 
assigns ^ to a leaf node labeled by symbol c. 

GF{D) : passionately ^ xi\s 

him ^ {X2I xz)\xi 

a I— » x-ij x/^^xi I x^ 
girl x^,Xi 
loves ^ (a;3\a;2)/a;5, (a;6\s)/a;7 
John ^ x^ 

Step 4. Unify the types assigned to the same symbol. Let A — {{^ | GF{D) : c 1— > 
A} I c G dom{GF{D))}, and compute a = ragu(A). The algorithm fails if unification fails. 

a = {Xj ^ Xs, ^ Xi, Xq ^ X3,X2 ^ s, X5 ^ X3} 

Step 5. Let RLG{D) = a[GF{D)]. 

RLG{D) : passionately xi\s 

him {s/x3)\xi 

a 2:3 /a;4 

girl 1-^ X4 
loves (a;3\s)/x3 
John X3 

Our algorithm is based on the "principal parse algorithm" described in the previous 
section, which has been proved to be correct and terminate, and the unification algorithm 
described in section lOl The result is, intuitively, the most general rigid Lambek Grammar 
which can generate all the proof tree structures appearing in the input sequence. 

9.4 Properties of RLG 

In the present section we prove some properties of the RLG algorithm that will be helpful 
to study its behaviour in the limit. 

The following lemma is almost trivial but it will play an important role in the convergence 
proof for the RLG algorithm. It simply states that the tree language of the grammar inferred 
just after the labeling of the structures properly contains the sample structures. 

Lemma 9.4 Let D be the input set of proof tree structures for the RLG algorithm. Then 
the set of the proof tree structures generated by the 'general form' grammar contains properly 
D. That is, D C PL{GF{D)). 
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Proof. Let D — {Ti, . . . , T„}. The labeling of the nodes of the structures in D that precedes 
the construction of GF{D) in fact forms a parse tree Vi of GF{D) for each structure Ti in 
D. This shows D C PL{GF{D)). The proper inclusion follows trivially from the fact that 
D is by hypothesis a finite set, while PL(G), the set of proof tree structures generated by a 
Lambek grammar G, is always infinite. 

Lemma 9.5 Each variable x £ Var{GF{D)) labels a unique node in a unique parse tree 
ofD. 

Proof. Obviously, by construction, if a; G Var{GF{D)), then there must be an i G N such 
that X labels one of the nodes of a parse tree Pi. Since, by construction, for each i ^ j 
the sets of variables that label Pi are disjoint, x appears in one and only one Pi. Besides, 
since variables are assigned only during the first phase of the type-assignment process of our 
algorithm, again by construction each variable labels only one node in the deduction tree. 

The following lemma makes explicit the relation between the grammar inferred just after 
the labeling of the structures in the algorithm RLG, and the structure language of the rigid 
grammar we are trying to infer. 

Lemma 9.6 Let D be a finite set of proof tree structures. Then, for any Lambek grammar 
G, the following are equivalent: 

(i) D C PL(G) 

(ii) There is a substitution a such that a\GF(jy)\ C G. 

Proof. (ii)=>(i). Suppose there is a substitution <j such that a\GF(D)\ C G. Then, from 
proposition 1^21 we have that PL(Gi^(-D)) C PL(G). This, together with lemma, lOI proves 
(i)- 

(i) ^(ii). Let D = {ri,...,T„} and let P, be GF{Dys parse of for 1 < i < n. 
Assume D C PL(G). Then G has a parse Qi of each Ti. Define a substitution a as follows: 
for each variable x € Var{GF{D)), find a (unique, due to lemma Pi that contains a 
(unique, again due to lemma, l9?5|l node labeled by x, and let a{x) be the type labeling the 
corresponding node of Qi . We show that 

if A labels a node of some Pi, then a {A) labels the corresponding node of Qi. 

Proof By induction on A e Tp{GF{D)) = {T | T is a subtype of some B G range{GF{D))}): 

Induction basis. If A G Var, this holds by definition. \{ A = t, then any node labeled by 
A in {Pi, . . . , Pn} is the root node of some Pi. Since Qi is a parse tree of G, the root node 
of Qi must be labeled by t. 

Induction step. Let A — B\G labels a node of Pi. Then the relevant part of Pi must 
look like one of the two following cases: 



INRIA 



Learnability for Rigid Lambek Grammars 



73 



• First case 




t 



By induction hypothesis, the corresponding part of Qi looks Uke: 




t 



Then A' = (j{B)\a{C) = a{B\C) = a{A). 
• Second case 



[B] 
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By induction hypothesis, the corresponding part of Qi looks Uke: 



KB)] 




t 



Then again A' ^ a{B)\a{C) = a{B\C) = <j{A). 

The case A — C/B is entirely similar, thus completing the induction. It follows that if 
GF{D) -.c^ A, then G : c a{A). Therefore, a[GF{D)] C G. 

The following proposition establishes an "if and only if" relation between the inclusion 
of our set of positive samples D in a tree language generated by a rigid grammar G and the 
successful termination of the RLG algorithm when it has D as its input set. Even more, 
we have that the rigid grammar inferred by the algorithm is not "larger" than the rigid 
grammar G. 

Proposition 9.7 Let D be a finite set of proof tree structures. Then, for any rigid grammar 
G, the following are equivalent: 

(i) D C PL(G); 

(ii) RLG{D) exists and RLG{D) □ G (equivalently, there is a substitution t such that 
t[RLG{D)] C G). 

Proof, (ii) (i) follows from lemma, l9^ and the fact that RLG{D) is a substitution instance 
of GF(D). 

(i) (ii). Assume that G is a rigid grammar such that D C PL{G). By lemma l?0)l there 
is a substitution a such that a[GF{D)] C G. Since G is a rigid grammar, a[GF{D)] is also a 
rigid grammar. Then a unifies the family A = {{A \ GF{D) : A} \ c ^ dom{GF{D))} . 
This means that RLG{D) exists and RLG{D) — aQ[GF{D)\, where ctq = mgu{A). Then 
there is a substitution r such that a = t o ao. Therefore, t[RLG{D)] = t[(To[GF{D)]] = 
It o (tq)[GF{D)] = a[GF{D)]. By assumption, a[GF{D)] C G, so t[RLG{D)] C G. 

Corollary 9.8 Let Di and D2 be two finite sets of proof tree structures such that Di C D2. 
IfRLG{D2) exists, RLG{Di) also exists and RLG {Di) □ RLG{D2) andVh{RLG{Di)) C 
Vh{RLG{D2)). 
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Proof. Immediate from nronosition 19.71 noting that if Di C D2, then {G £ Grigid \ Di Q 

PUG)} 2{Ge Gr^g^d \ D2 C PL(G)}. 

Definition 9.9 Let ipRLG be the learning function for the grammar system {Grigid, S^, PL) 
defined as follows:^ 

fRLcHTo, . . . , T„)) ^ RLGilTo, r„}). 



Thanks to previous propositions and lemmas we are able to prove the convergence for 
the RLG algorithm: 

Theorem 9.10 (Prlg learns Grigid from structures. 

Proof. We prove that fB,LG learns the class of rigid Lambek grammars from proof tree 
structures. 

Let G be any rigid Lambek grammar and let {Ti)iiz^ be an infinite sequence enumerating 
PL(G). For each i G N, {Tq, . . . ,TJ C PL(G), so by proposition EHItpMc ((To, ■ ■ ■ ,T,)) = 
RLG{{To, . . . , Tj}) is defined and 

iPRLGi{To,...,T,)) □ ipRLG{{To,...,T,+i)), 

by corollary 19.81 and 

^RLG{{To,...,mQG. 

Since, by corollarv 16.411 there are only finitely many Lambek grammars G" C G, (Prlg 
must converge on(ri)ieN to some G' . Then PL(G) = {T, | i e N} C PL(G'). Since G' C G, 
by proposition ESI PL(G') C PL(G). Therefore, PL(G') = PL(G). 

When RLG is appHed successively to a sequence of increasing set of proof tree structures 
-Do C -Di C I?2 C • • • , it is more efficient to make use of the previous value RLG(Di_i) to 
compute the current value RLG(Z?i). 

Definition 9.11 If G is a rigid Lambek grammar and D is a finite set of proof tree struc- 
tures, then let 

RLG^^^ (G, D) 2± G U RLG{D). 
Lemma 9.12 If Di and D2 are two finite sets of proof tree structures, 

RLG'-^\RLGiDi),D2) ~ RLG{DiUD2). 

^Recall that the symbol ~ means that either both sides are defined and are equal, or else both sides are 
undefined 
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Proof. (See |Ka,n98]V Suppose that RLG^^\RLG{Di), D2) is defined. By lemma l8l5l 
RLG{Di) C RLG'^^\RLG{Di),D2) and RLG{D2) = RLG^^HRLGiDi), D2). This implies 
that Di U D2 C PL{RLG'^^HRLG{Di),D2)), so by proposition O RLG{Di U L»2) exists 
and RLG{DiUD2) E RLG^^\RLG{Di), D2). 

Suppose now that RLG{Di U D2) is defined. By corollary EH RLG{Di) and RLG{D2) 
exist and RLG{Di) □ RLG{DiUD2) and RLG{D2) C RLG{DiUD2). Then RLG{DiUD2) 
is an upper bound of {i?LG(L>i), i?iG(i:)2)}- By proposition 1133 RLG{Di) U RLG{D2) = 
RLG^'^\RLG{Di),D2) exists and RLG^'^\RLG{Di), D2) C RLG{DiUD2). 

Thus it has been proved that if one of RLG^^^{RLG{Di), D2) and RLG{Di U D2) is 
defined the other is defined and they are equal. 

Proposition 9.13 ^prlg has the following properties: 

(i) V'fliG /earns ^/rigid prudently. 

(ii) (pRLG is responsive and consistent on Qrigid. 

(iii) (fiB^LG is set-driven. 

(iv) ^Prlg is conservative. 

(v) ifRLG is monotone increasing. 

(vi) (pRLG is incremental. 



(i) Since range{ipRLG) ^ frigid, </5_rlg learns prudently. 

(ii) If Z? C i for some L e VCrigid, then by proposition li). 71 i^LGfl?) exists and by lemma 
19.61 £) C PL(i?LG(Z))). This means that fRLO is responsive and consistent on Qrigid- 

(iii) (fiRLG is set-driven by definition. 

(iv) Let T G PL{RLG{D)). Then DU{T} C PL{RLG{D)). By proposition O RLG{DU 
{T}) exists and RLG{DU{T}) C RLG{D). By corollary EHl we have also RLG{D) C 
RLG{D U {T}). This shows that (/?klg is conservative. 

(v) Trivial from corolla,rv 19.81 

(vi) Define a computable function ip : Qrigid x ^ Qrigid as follows: 



Proof. 





■d and RLG{{T}) is defined, 



undefined otherwise. 



Then by lemmaEH 'Pi?LG((ro, . . . , T.+i)) ~ V(¥'flLG((ro, ■ • ■ , T,)), T,+i). 
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10 Conclusion and Further Research 

This work aims at making a further step in the direction of bridging the gap which still 
separates any formal/computational theory of learning from a meaningful formal linguistic 
theory. 

We have introduced the basic notions of Formal Learnability Theory as first formulated 
by E.M. Gold in 1967, and of Lambek Grammars, which appeared for the first time in an 
article of 1958. 

The former, which is one of the first completely formal descriptions of the process of gram- 
matical inference, after an initial skepticism about its effective applicability, is at present to 
object of a renewed interest due to some meaningful and promising learnabiluty results. 

Even the latter, long neglected by the linguistic community, is experiencing a strong 
renewed interest as a consequence of recent linguistics achievements which point at formal 
grammars completely lexicalized, as Lambek grammars are. Even if they're still far from 
being the ultimate formal device for the formalization of human linguistics competence, 
they're universally looked at as a promising tool for further developments of computational 
linguistics. 

In the present work we've drawn the attention to a particular class of Lambek grammars 
called rigid Lambek grammars, and we've proved that they are learnable in Gold's framework 
from a structured input. We've used most recent results by Hans-Joerg Tiede for formally 
define our notion of structure for a sentence: he has recently proved that the proof tree 
language generated by a Lambek grammar strictly contains the tree language generated by 
context-free grammars. His notion of a proof as the grammatical structure of a sentence in a 
categorial grammar is also useful in providing a natural support to a Montagovian semantics 
for that sentence. Therefore, our choice for a structured input for our learning algorithm in 
the form of proof tree structures is not gratuitous, but it's coherent with the mainstream 
of (psycho-)linguistics theories about first language learning which stress the importance of 
providing the learner with informatioannly and semantically rich input in the process of her 
language acquisition. 

We believe it to be a partial but meaningful result, which once more shows how versatile 
and powerful can be this learning theory, once neglected because it was widely held that it 
couldn't but account for the learnability of most trivial classes of grammars. 

Much is left to be done along many directions. First of all, there's still no real theory 
of rigid, or k-valued, Lambek grammars: we still know very few formal properties of such 
grammars which seem to have an undisputable linguistic interest. We still lack, for example, 
a hierarchy theorem for languages generated by k-valued Lambek grammars. 

Another important point which is still unanswered lies in the decidibility for PL(Gi) C 
PL{G2) for Gi, G2 Lambek grammars, that is deciding whether the tree language generated 
by a grammar is contained in the tree language generated by another one, for any two gram- 
mars. Such a question is decidable for the non-associative variant of Lambek grammars. 
Proving this question decidable would allow as to very esaily devise a learning algorithm for 
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k-valued Lambek grammars. 

Our learnability result is in our opinion a first step toward a more convincing and lin- 
guistically plausible model of learning for k-valued Lambek grammars from less and less 
structurally rich input. Needless to say, learning from such an informationally rich input 
like proof-tree structures are hardly has any linguistic plausibility. On the other hand the 
deep connections between proof tree structures for a sentence in Lambek grammars and its 
"Montague-like" semantics seems to address to a more convincing model for learning based 
both on syntactic and semantic information. 
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